The lazy (NTK) and rich ($μ$P) regimes: a gentle tutorial

要約

最新の機械学習パラダイムの中心的なテーマは、大規模なニューラルネットワークがさまざまな指標でより優れたパフォーマンスを達成するということです。
これらの過剰パラメータ化されたモデルの理論的分析は、最近、非常に広範なニューラルネットワークの研究を中心に行われています。
このチュートリアルでは、厳密ではないが例示的な次の事実の導出を提供します。広範囲のネットワークを効果的にトレーニングするために、学習率や初期重みのサイズなどのハイパーパラメーターの選択の自由度は 1 つだけです。
この自由度は、トレーニング動作の豊かさを制御します。最小では、ワイドネットワークはカーネルマシンのように遅延トレーニングされ、最大では、いわゆる $\mu$P 領域での特徴学習を示します。
この論文では、この豊かさの尺度を説明し、最近の研究結果を一貫した全体に統合し、新しい視点と直観を提供し、私たちの主張を裏付ける経験的証拠を提供します。
そうすることで、実際のディープニューラルネットワークにおける特徴学習の科学理論を開発する鍵となる可能性があるため、リッチネススケールのさらなる研究を奨励したいと考えています。

要約(オリジナル)

A central theme of the modern machine learning paradigm is that larger neural networks achieve better performance on a variety of metrics. Theoretical analyses of these overparameterized models have recently centered around studying very wide neural networks. In this tutorial, we provide a nonrigorous but illustrative derivation of the following fact: in order to train wide networks effectively, there is only one degree of freedom in choosing hyperparameters such as the learning rate and the size of the initial weights. This degree of freedom controls the richness of training behavior: at minimum, the wide network trains lazily like a kernel machine, and at maximum, it exhibits feature learning in the so-called $\mu$P regime. In this paper, we explain this richness scale, synthesize recent research results into a coherent whole, offer new perspectives and intuitions, and provide empirical evidence supporting our claims. In doing so, we hope to encourage further study of the richness scale, as it may be key to developing a scientific theory of feature learning in practical deep neural networks.

arxiv情報

著者	Dhruva Karkada
発行日	2024-04-30 17:11:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The lazy (NTK) and rich ($μ$P) regimes: a gentle tutorial

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー