AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs

要約

確率的勾配降下 (SGD) オプティマイザーは、通常、畳み込みニューラルネットワーク (CNN) のトレーニングに使用されます。
近年、Adam、diffGrad、Radam、AdaBelief など、いくつかのアダプティブモメンタムベースの SGD オプティマイザが導入されました。
ただし、既存の SGD オプティマイザーは、過去の反復の勾配ノルムを利用しないため、収束とパフォーマンスが低下します。
この論文では、勾配ノルムの適応トレーニング履歴に基づいて、各反復で勾配のノルムを修正することにより、新しい AdaNorm ベースの SGD オプティマイザーを提案します。
そうすることで、提案されたオプティマイザーは、トレーニング全体で高くて代表的な勾配を維持し、低くて非定型の勾配の問題を解決することができます。
提案された概念は一般的なものであり、既存の SGD オプティマイザーで使用できます。
Adam、diffGrad、Radam、AdaBelief を含む 4 つの最先端のオプティマイザを使用して、提案された AdaNorm の有効性を示します。
CIFAR10、CIFAR100、TinyImageNet などの 3 つのベンチマークオブジェクト認識データセットで、VGG16、ResNet18、ResNet50 などの 3 つの CNN モデルを使用する提案されたオプティマイザーによるパフォーマンスの向上を示します。
コード: \url{https://github.com/shivram1987/AdaNorm}.

要約(オリジナル)

The stochastic gradient descent (SGD) optimizers are generally used to train the convolutional neural networks (CNNs). In recent years, several adaptive momentum based SGD optimizers have been introduced, such as Adam, diffGrad, Radam and AdaBelief. However, the existing SGD optimizers do not exploit the gradient norm of past iterations and lead to poor convergence and performance. In this paper, we propose a novel AdaNorm based SGD optimizers by correcting the norm of gradient in each iteration based on the adaptive training history of gradient norm. By doing so, the proposed optimizers are able to maintain high and representive gradient throughout the training and solves the low and atypical gradient problems. The proposed concept is generic and can be used with any existing SGD optimizer. We show the efficacy of the proposed AdaNorm with four state-of-the-art optimizers, including Adam, diffGrad, Radam and AdaBelief. We depict the performance improvement due to the proposed optimizers using three CNN models, including VGG16, ResNet18 and ResNet50, on three benchmark object recognition datasets, including CIFAR10, CIFAR100 and TinyImageNet. Code: \url{https://github.com/shivram1987/AdaNorm}.

arxiv情報

著者	Shiv Ram Dubey,Satish Kumar Singh,Bidyut Baran Chaudhuri
発行日	2022-10-12 16:17:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー