FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

要約

Gemini-1.5、Deepseek-V3、Llama-4などの最近の大規模な言語モデルは、トークンごとのモデルの一部のみをアクティブにすることで、強力な効率性能トレードオフを提供する強力な効率性能トレードオフを提供します。
しかし、学術研究者は、スケーリング、ルーティング、専門家の行動を調査するための完全にオープンなエンドツーエンドのMOEプラットフォームをまだ欠いています。
38mから1.7bのアクティブパラメーターの範囲の7つのデコーダーのみのモデルで構成される完全にオープンソースの研究スイートであるFlame-Moeをリリースします。
すべてのトレーニングデータパイプライン、スクリプト、ログ、およびチェックポイントは、再現可能な実験を可能にするために公開されています。
6つの評価タスクで、Flame-Moeは、同一のフロップで訓練された密なベースラインよりも最大3.4ポイントの平均精度を向上させます。
完全なトレーニングトレースの透明度を活用して、（i）専門家が異なるトークンサブセットにますます専門化することを示す初期分析を提示します。
すべてのコード、トレーニングログ、およびモデルチェックポイントは、https：//github.com/cmu-flame/flame-moeで入手できます。

要約(オリジナル)

Recent large language models such as Gemini-1.5, DeepSeek-V3, and Llama-4 increasingly adopt Mixture-of-Experts (MoE) architectures, which offer strong efficiency-performance trade-offs by activating only a fraction of the model per token. Yet academic researchers still lack a fully open, end-to-end MoE platform for investigating scaling, routing, and expert behavior. We release FLAME-MoE, a completely open-source research suite composed of seven decoder-only models, ranging from 38M to 1.7B active parameters, whose architecture–64 experts with top-8 gating and 2 shared experts–closely reflects modern production LLMs. All training data pipelines, scripts, logs, and checkpoints are publicly available to enable reproducible experimentation. Across six evaluation tasks, FLAME-MoE improves average accuracy by up to 3.4 points over dense baselines trained with identical FLOPs. Leveraging full training trace transparency, we present initial analyses showing that (i) experts increasingly specialize on distinct token subsets, (ii) co-activation matrices remain sparse, reflecting diverse expert usage, and (iii) routing behavior stabilizes early in training. All code, training logs, and model checkpoints are available at https://github.com/cmu-flame/FLAME-MoE.

arxiv情報

著者	Hao Kang,Zichun Yu,Chenyan Xiong
発行日	2025-05-26 17:06:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー