An exactly solvable model for emergence and scaling laws

要約

深層学習モデルは、トレーニング時間 ($T$)、トレーニングデータ ($D$)、またはモデルサイズ ($N$) が増加するにつれて、新しい問題を解決する突然の能力を発揮することがあります。これは創発として知られる現象です。
この論文では、それぞれの新しい能力 (スキル) が基底関数として表現されるフレームワークを紹介します。
このスキルベースで単純な多線形モデルを解き、新しいスキルの出現と、トレーニング時間、データサイズ、モデルサイズ、および最適なコンピューティングによる損失の法則のスケーリングに関する分析式を見つけます ($C$
）。
詳細な計算を、データセット内のタスクがべき乗則に従って分散されるマルチタスクスパースパリティでトレーニングされた 2 層ニューラルネットワークの直接シミュレーションと比較します。
私たちの単純なモデルは、単一の適合パラメーターを使用して、ニューラルネットワーク内でトレーニング時間、データサイズ、またはモデルサイズが増加するにつれて複数の新しいスキルがシグモイド的に出現する様子を捉えます。

要約(オリジナル)

Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time ($T$), training data ($D$), or model size ($N$) increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute ($C$). We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.

arxiv情報

著者	Yoonsoo Nam,Nayara Fonseca,Seok Hyeong Lee,Ard Louis
発行日	2024-04-26 17:45:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An exactly solvable model for emergence and scaling laws

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー