Matryoshka Model Learning for Improved Elastic Student Models

要約

業界グレードのMLモデルは、モデル開発に重要なリソースを必要とする急速に進化するサービスの制約を満たすように慎重に設計されています。
このペーパーでは、新しい教師と学生のレシピを使用して、複数の正確な学生モデルをトレーニングするためのフレームワークであるMattaを提案します。
TAモデルは、容量が高い学生モデルのより大きなバージョンであるため、生徒モデルが教師モデルによりよく関係し、より多くのドメイン固有の専門知識をもたらすことができます。
さらに、TAモデルから複数の正確な学生モデルを抽出できます。
したがって、トレーニングの実行は1つだけにもかかわらず、方法論は、より低いサービングコストのために精度をトレードオフするための複数の保守可能なオプションを提供します。
独自のデータセットとモデルで提案された方法Mattaを実証します。
その実際の有効性は、生産MLシステム内のライブA/Bテストによって強調されており、キーメトリックの20％の改善を示しています。
また、パブリックモデルであるGPT-2メディアに関する方法を実証し、SAT数学の24％以上、ランバダベンチマークで10％以上の相対的な改善を達成します。

要約(オリジナル)

Industry-grade ML models are carefully designed to meet rapidly evolving serving constraints, which requires significant resources for model development. In this paper, we propose MatTA, a framework for training multiple accurate Student models using a novel Teacher-TA-Student recipe. TA models are larger versions of the Student models with higher capacity, and thus allow Student models to better relate to the Teacher model and also bring in more domain-specific expertise. Furthermore, multiple accurate Student models can be extracted from the TA model. Therefore, despite only one training run, our methodology provides multiple servable options to trade off accuracy for lower serving cost. We demonstrate the proposed method, MatTA, on proprietary datasets and models. Its practical efficacy is underscored by live A/B tests within a production ML system, demonstrating 20% improvement on a key metric. We also demonstrate our method on GPT-2 Medium, a public model, and achieve relative improvements of over 24% on SAT Math and over 10% on the LAMBADA benchmark.

arxiv情報

著者	Chetan Verma,Aditya Srinivas Timmaraju,Cho-Jui Hsieh,Suyash Damle,Ngot Bui,Yang Zhang,Wen Chen,Xin Liu,Prateek Jain,Inderjit S Dhillon
発行日	2025-06-02 09:31:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Matryoshka Model Learning for Improved Elastic Student Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー