Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

要約

大規模言語モデル (LLM) のスケーリング則は、サイズやトレーニングデータなどのパラメーターに基づいてモデルのパフォーマンスを予測します。
ただし、モデルファミリ間のトレーニング構成とデータ処理の違いにより、ベンチマークのパフォーマンスに大きな変動が生じ、単一のスケーリング則をすべての LLM に一般化することが困難になります。
一方、ファミリー固有のスケーリング則をトレーニングするには、ファミリーごとにさまざまなサイズのモデルをトレーニングする必要があります。
この研究では、スキルスケーリング則 (SSLaws、ナマケモノと発音) を提案します。これは、公開されているベンチマークデータを活用し、LLM のパフォーマンスが推論や指示に従うなどの低次元の潜在スキルによって駆動されると想定する新しいスケーリング則です。
これらの潜在的なスキルは、モデルサイズやトレーニングトークンなどの計算リソースの影響を受けますが、効率はモデルファミリによって異なります。
Sloth はベンチマーク間の相関関係を利用して、より正確で解釈可能な予測を提供すると同時に、ファミリーごとに複数の LLM をトレーニングする必要性を軽減します。
Open LLM Leaderboard v1/v2 のパラメーター特定に関する理論的結果と 12 の著名なベンチマークに関する経験的評価の両方を提示し、Sloth が LLM パフォーマンスを効率的に予測し、コーディングや感情的インテリジェンスアプリケーションなどの下流タスクのスケーリング動作に関する洞察を提供することを実証します。

要約(オリジナル)

Scaling laws for large language models (LLMs) predict model performance based on parameters like size and training data. However, differences in training configurations and data processing across model families lead to significant variations in benchmark performance, making it difficult for a single scaling law to generalize across all LLMs. On the other hand, training family-specific scaling laws requires training models of varying sizes for every family. In this work, we propose Skills Scaling Laws (SSLaws, pronounced as Sloth), a novel scaling law that leverages publicly available benchmark data and assumes LLM performance is driven by low-dimensional latent skills, such as reasoning and instruction following. These latent skills are influenced by computational resources like model size and training tokens but with varying efficiencies across model families. Sloth exploits correlations across benchmarks to provide more accurate and interpretable predictions while alleviating the need to train multiple LLMs per family. We present both theoretical results on parameter identification and empirical evaluations on 12 prominent benchmarks, from Open LLM Leaderboard v1/v2, demonstrating that Sloth predicts LLM performance efficiently and offers insights into scaling behaviors for downstream tasks such as coding and emotional intelligence applications.

arxiv情報

著者	Felipe Maia Polo,Seamus Somerstep,Leshem Choshen,Yuekai Sun,Mikhail Yurochkin
発行日	2024-12-09 14:51:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー