More Compute Is What You Need

要約

大規模な言語モデルの事前トレーニングはますます高価になっており、ほとんどの実務者はスケーリングの法則に依存してモデルサイズとトレーニングトークンにコンピューティング予算を割り当てています。これは一般に Compute-Optimal または Chinchilla Optimal と呼ばれます。
このペーパーでは、モデルのパフォーマンスがモデルサイズやデータセットサイズへの特定の割り当てとは関係なく、トランスフォーマーベースのモデルに費やされるコンピューティング量に主に依存することを示唆する新しいスケーリング則を仮説します。
この統一されたスケーリング則を使用すると、(a) 推論効率を高めるために、トレーニングではより小さなモデルサイズとより大きなトレーニングデータセットを優先する必要があり、(b) 利用可能な Web データセットが枯渇すると仮定すると、モデルサイズをスケーリングすることが、推論効率をさらに高める唯一の方法である可能性があると予測します。
モデルのパフォーマンスを向上させます。

要約(オリジナル)

Large language model pre-training has become increasingly expensive, with most practitioners relying on scaling laws to allocate compute budgets for model size and training tokens, commonly referred to as Compute-Optimal or Chinchilla Optimal. In this paper, we hypothesize a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models, independent of the specific allocation to model size and dataset size. Using this unified scaling law, we predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.

arxiv情報

著者	Zhen Guo
発行日	2024-04-30 12:05:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

More Compute Is What You Need

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー