Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

要約

大規模な機械学習モデルの帰納的バイアスと一般化特性は、トレーニングに使用される最適化アルゴリズムの副産物です。
とりわけ、ランダム初期化、学習率、および早期停止のスケールはすべて、確率的勾配降下または関連するアルゴリズムによって学習したモデルの品質に重要な影響を及ぼします。
これらの現象を理解するために、大規模な2層ニューラルネットワークのトレーニングダイナミクスを研究します。
このダイナミクスの漸近高次元特性評価を得るために、非平衡統計物理学（動的平均場理論）の確立された手法を使用します。
この特性評価は、隠されたニューロンの非線形性のガウス近似に適用され、実際のニューラルネットワークモデルの挙動を経験的にキャプチャします。
私たちの分析は、トレーニングダイナミクスにおけるいくつかの興味深い新しい現象を明らかにします。$（i）$ガウス/ラデマッハの複雑さの成長に関連する遅い時間スケールの出現。
$（ii）$結果として、小さな複雑さに対するアルゴリズム誘導バイアス。
$（iii）$機能学習と過剰適合の間の時間スケールの分離。
$（iv）$テストエラーの非モノトーン動作と、それに応じて、大規模な時期に「フィーチャー未学習」フェーズ。

要約(オリジナル)

The inductive bias and generalization properties of large machine learning models are — to a substantial extent — a byproduct of the optimization algorithm used for training. Among others, the scale of the random initialization, the learning rate, and early stopping all have crucial impact on the quality of the model learnt by stochastic gradient descent or related algorithms. In order to understand these phenomena, we study the training dynamics of large two-layer neural networks. We use a well-established technique from non-equilibrium statistical physics (dynamical mean field theory) to obtain an asymptotic high-dimensional characterization of this dynamics. This characterization applies to a Gaussian approximation of the hidden neurons non-linearity, and empirically captures well the behavior of actual neural network models. Our analysis uncovers several interesting new phenomena in the training dynamics: $(i)$ The emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity; $(ii)$ As a consequence, algorithmic inductive bias towards small complexity, but only if the initialization has small enough complexity; $(iii)$ A separation of time scales between feature learning and overfitting; $(iv)$ A non-monotone behavior of the test error and, correspondingly, a `feature unlearning’ phase at large times.

arxiv情報

著者	Andrea Montanari,Pierfrancesco Urbani
発行日	2025-02-28 17:45:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー