SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs

要約

我々は、AllenAI の Tulu 3 ポストトレーニングパイプラインを適応させて Huggingface の SmolLM2-1.7B 基本モデルを強化する命令調整言語モデルである SmolTulu-1.7b-Instruct (このレポートでは SmolTulu-DPO-1130 として参照) を紹介します。
1 億 3,500 万のパラメーターモデルを使用した包括的な実証分析を通じて、学習率とバッチサイズの関係がタスクに応じてモデルのパフォーマンスに大きな影響を与えることを実証しました。
私たちの調査結果では、明確な分かれ道が明らかになりました。ARC や GSM8K などの推論タスクは、バッチサイズに対する学習率の比率が高いことでメリットが得られるのに対し、HellaSwag や IFEval などのパターン認識タスクは、比率が低くても最適なパフォーマンスを示します。
これらの洞察は、SmolTulu の開発に影響を与えました。SmolTulu は、命令追従に関してサブ 2B パラメーターモデルの中で最先端のパフォーマンスを達成し、IFEval ($\Delta$11%) で 67.7% のスコアを達成し、GSM8K で 51.6% の数学的推論を達成しました (
$\Delta$3.4%)、代替バージョンは ARC で 57.1% のスコアを達成しました ($\Delta5.4%$)。
私たちは、効率的なモデルの調整に関するさらなる研究を促進するために、モデル、トレーニングレシピ、およびアブレーション研究をリリースし、最適化ダイナミクスを注意深く適応させることが小規模言語モデルと大規模言語モデル間の機能ギャップを埋めるのに役立つことを実証します。

要約(オリジナル)

We present SmolTulu-1.7b-Instruct, referenced in this report as SmolTulu-DPO-1130, an instruction-tuned language model that adapts AllenAI’s Tulu 3 post-training pipeline to enhance Huggingface’s SmolLM2-1.7B base model. Through comprehensive empirical analysis using a 135M parameter model, we demonstrate that the relationship between learning rate and batch size significantly impacts model performance in a task-dependent manner. Our findings reveal a clear split: reasoning tasks like ARC and GSM8K benefit from higher learning rate to batch size ratios, while pattern recognition tasks such as HellaSwag and IFEval show optimal performance with lower ratios. These insights informed the development of SmolTulu, which achieves state-of-the-art performance among sub-2B parameter models on instruction following, scoring 67.7% on IFEval ($\Delta$11%), and mathematical reasoning with 51.6% on GSM8K ($\Delta$3.4%), with an alternate version achieving scoring 57.1% on ARC ($\Delta5.4%$). We release our model, training recipes, and ablation studies to facilitate further research in efficient model alignment, demonstrating that careful adaptation of optimization dynamics can help bridge the capability gap between small and large language models.

arxiv情報

著者	Sultan Alrashed
発行日	2024-12-11 12:41:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー