Continuous-Time Analysis of Adaptive Optimization and Normalization

要約

適応最適化アルゴリズム、特に Adam とそのバリアント AdamW は、最新の深層学習の基本コンポーネントです。
ただし、トレーニングのダイナミクスには包括的な理論的理解が欠けており、特定のハイパーパラメーターの選択や正規化レイヤーなどの一般的な実践が一般化の成功に寄与する理由についての洞察は限られています。
この研究では、Adam と AdamW の連続時間定式化を提示し、そのような実践的な問題を明らかにするトレーニングダイナミクスの扱いやすい分析を促進します。
理論的には、有界更新を保証するアダムのハイパーパラメータ $(\beta, \gamma)$ の安定領域を導出し、この領域外でのパラメータ更新の不安定な指数関数的増加を観察することでこれらの予測を経験的に検証します。
さらに、スケール不変のアーキテクチャコンポーネントの暗黙的なメタ適応効果を明らかにすることで、正規化層の成功を理論的に正当化します。
この洞察は、明示的なオプティマイザ $2$-Adam につながります。これを $k$-Adam に一般化します。これは、Adam ($k=1$ に対応) と Adam を含む適応正規化手順を $k$ 回適用するオプティマイザです。
正規化層 ($k=2$ に対応)。
全体として、Adam の連続時間定式化は原則に基づいた分析を促進し、最新の深層学習における最適なハイパーパラメーターの選択とアーキテクチャ上の決定についてのより深い理解を提供します。

要約(オリジナル)

Adaptive optimization algorithms, particularly Adam and its variant AdamW, are fundamental components of modern deep learning. However, their training dynamics lack comprehensive theoretical understanding, with limited insight into why common practices – such as specific hyperparameter choices and normalization layers – contribute to successful generalization. This work presents a continuous-time formulation of Adam and AdamW, facilitating a tractable analysis of training dynamics that can shed light on such practical questions. We theoretically derive a stable region for Adam’s hyperparameters $(\beta, \gamma)$ that ensures bounded updates, empirically verifying these predictions by observing unstable exponential growth of parameter updates outside this region. Furthermore, we theoretically justify the success of normalization layers by uncovering an implicit meta-adaptive effect of scale-invariant architectural components. This insight leads to an explicit optimizer, $2$-Adam, which we generalize to $k$-Adam – an optimizer that applies an adaptive normalization procedure $k$ times, encompassing Adam (corresponding to $k=1$) and Adam with a normalization layer (corresponding to $k=2$). Overall, our continuous-time formulation of Adam facilitates a principled analysis, offering deeper understanding of optimal hyperparameter choices and architectural decisions in modern deep learning.

arxiv情報

著者	Rhys Gould,Hidenori Tanaka
発行日	2024-11-08 18:07:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Continuous-Time Analysis of Adaptive Optimization and Normalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー