MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

要約

実質的な実験により、バッチ正規化 (BN) レイヤーが収束と一般化に利益をもたらすことに成功したことが検証されました。
ただし、BN には追加のメモリと浮動小数点計算が必要です。
さらに、BN はバッチ統計に依存するため、マイクロバッチでは不正確になります。
この論文では、BN 層の 2 つの基本的な影響、つまりデータ非相関性と適応学習率を維持しながら、BN 正則化を簡素化することでこれらの問題に対処します。
ネットワークトレーニングの収束と効率を向上させるために、MimicNorm という新しい正規化方法を提案します。
MimicNorm は、修正された重み平均演算 (重みパラメーターテンソルから平均値を減算) と損失関数の前の 1 つの BN 層 (最後の BN 層) を含む 2 つの軽い操作のみで構成されます。
ニューラルタンジェントカーネル（NTK）理論を活用して、重み平均演算が活性化を白くし、ネットワークをBN層のようなカオス領域に移行させ、その結果、収束の強化につながることを証明します。
最後の BN レイヤーは自動調整された学習率を提供し、精度も向上します。
実験結果では、MimicNorm が ResNet や ShuffleNet などの軽量ネットワークを含むさまざまなネットワーク構造に対して同様の精度を達成し、メモリ消費量を約 20% 削減できることが示されています。
コードは https://github.com/Kid-key/MimicNorm で公開されています。

要約(オリジナル)

Substantial experiments have validated the success of Batch Normalization (BN) Layer in benefiting convergence and generalization. However, BN requires extra memory and float-point calculation. Moreover, BN would be inaccurate on micro-batch, as it depends on batch statistics. In this paper, we address these problems by simplifying BN regularization while keeping two fundamental impacts of BN layers, i.e., data decorrelation and adaptive learning rate. We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. MimicNorm consists of only two light operations, including modified weight mean operations (subtract mean values from weight parameter tensor) and one BN layer before loss function (last BN layer). We leverage the neural tangent kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer, and consequently, leads to an enhanced convergence. The last BN layer provides autotuned learning rates and also improves accuracy. Experimental results show that MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption. The code is publicly available at https://github.com/Kid-key/MimicNorm.

arxiv情報

著者	Wen Fei,Wenrui Dai,Chenglin Li,Junni Zou,Hongkai Xiong
発行日	2023-09-27 11:38:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー