Autonomy-of-Experts Models

要約

Mixture-of-Experts (MoE) モデルは主にルーターを使用して特定のエキスパートモジュールにトークンを割り当て、部分的なパラメーターのみをアクティブにし、多くの場合、高密度モデルよりも優れたパフォーマンスを発揮します。
私たちは、ルーターの意思決定と専門家の実行との分離は重要であるにもかかわらず見落とされており、最適ではない専門家の選択と非効率な学習につながると主張します。
これに対処するために、私たちは、専門家が入力を処理するために自律的に自分自身を選択する新しい MoE パラダイムである Autonomy-of-Experts (AoE) を提案します。
AoE は、専門家がトークンを効果的に処理する自身の能力を認識しているという洞察に基づいており、その認識は内部アクティベーションの規模に反映されます。
AoE では、ルーターは削除されます。
代わりに、専門家が入力の内部アクティベーションを事前に計算し、アクティベーション基準に基づいてランク付けされます。
トップランクのエキスパートだけがフォワードパスを続行し、他のエキスパートは中止します。
アクティベーションの事前計算のオーバーヘッドは、低ランクの重み因数分解によって削減されます。
この自己評価を行ってからパートナーと比較するアプローチにより、専門家の選択と効果的な学習が向上します。
700M から 4B までのパラメータを持つ言語モデルを事前トレーニングし、AoE が同等の効率で従来の MoE モデルよりも優れていることを実証しました。

要約(オリジナル)

Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router’s decision-making and the experts’ execution is a critical yet overlooked issue, leading to suboptimal expert selection and ineffective learning. To address this, we propose Autonomy-of-Experts (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort. The overhead of pre-computing activations is reduced through a low-rank weight factorization. This self-evaluating-then-partner-comparing approach ensures improved expert selection and effective learning. We pre-train language models having 700M up to 4B parameters, demonstrating that AoE outperforms traditional MoE models with comparable efficiency.

arxiv情報

著者	Ang Lv,Ruobing Xie,Yining Qian,Songhao Wu,Xingwu Sun,Zhanhui Kang,Di Wang,Rui Yan
発行日	2025-01-22 18:37:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Autonomy-of-Experts Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー