Observational Scaling Laws and the Predictability of Language Model Performance

要約

言語モデルのパフォーマンスが規模に応じてどのように変化するかを理解することは、ベンチマークとアルゴリズムの開発にとって重要です。
スケーリングの法則は、この理解を構築するための 1 つのアプローチですが、多くの異なるスケールにわたってモデルをトレーニングする必要があるため、その使用は制限されています。
私たちは、モデルのトレーニングをバイパスし、代わりに公開されている約 80 のモデルからスケーリング則を構築する、代替の観察アプローチを提案します。
複数のモデルファミリから単一のスケーリング則を構築することは、トレーニングコンピューティングの効率と機能に大きなばらつきがあるため、困難です。
ただし、これらの変動は、言語モデルのパフォーマンスが低次元の機能空間の関数であり、モデルファミリはトレーニングコンピューティングを機能に変換する効率のみが異なるという単純な一般化されたスケーリング則と一致していることを示します。
このアプローチを使用して、複雑なスケーリング現象の驚くべき予測可能性を示します。いくつかの創発現象が滑らかなシグモイド挙動に従い、小さなモデルから予測可能であることを示します。
GPT-4 などのモデルのエージェントのパフォーマンスが、より単純な非エージェントベンチマークから正確に予測できることを示します。
また、言語モデルの機能が向上し続けるにつれて、思考連鎖や自己一貫性などのトレーニング後の介入の影響を予測する方法を示します。

要約(オリジナル)

Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~80 publically available models. Building a single scaling law from multiple model families is challenging due to large variations in their training compute efficiencies and capabilities. However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities. Using this approach, we show the surprising predictability of complex scaling phenomena: we show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models; we show that the agent performance of models such as GPT-4 can be precisely predicted from simpler non-agentic benchmarks; and we show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.

arxiv情報

著者	Yangjun Ruan,Chris J. Maddison,Tatsunori Hashimoto
発行日	2024-05-17 17:49:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Observational Scaling Laws and the Predictability of Language Model Performance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー