Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

要約

Mixture-of-Experts (MoE) は、Large Vision-Language Model (LVLM) の研究においてますます注目を集めています。
疎モデルを使用して密モデルを置き換え、推論中にアクティブにするパラメータを減らしながら同等のパフォーマンスを達成するため、推論コストが大幅に削減されます。
LVLM の既存の MoE 手法では、さまざまな専門家がさまざまなトークンを処理することが推奨されており、通常はルーターを使用して各トークンのルーティングを予測します。
ただし、予測はサンプルの特徴のみに基づいており、トークンの最適化の方向性を実際に明らかにするものではありません。
これにより、エキスパートに割り当てられたさまざまなトークン間で深刻な最適化干渉が発生する可能性があります。
この問題に対処するために、この論文では、トークンレベルの勾配分析に基づく新しい方法、つまり、トークン勾配競合の解決 (STGC) を提案します。
具体的には、まずトークンレベルの勾配を使用して、エキスパート内で競合するトークンを特定します。
その後、各エキスパート内のトークン間の競合を排除するために調整された特殊な損失を追加します。
私たちの方法は、多様な大規模視覚言語モデルのプラグインとして機能し、広範な実験結果がその有効性を実証しています。
コードは https://github.com/longrongyang/STGC で公開されます。

要約(オリジナル)

The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually employ a router to predict the routing of each token. However, the predictions are based solely on sample features and do not truly reveal the optimization directions of tokens. This may lead to severe optimization interference between different tokens assigned to an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis, i.e., Solving Token Gradient Conflict (STGC). Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-Language Models, and extensive experimental results demonstrate its effectiveness. The code will be publicly available at https://github.com/longrongyang/STGC.

arxiv情報

著者	Longrong Yang,Dong Shen,Chaoxiang Cai,Fan Yang,Size Li,Di Zhang,Xi Li
発行日	2024-08-05 12:12:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー