LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models

要約

エキスパート（MOE）の混合物は、最近、継続的なマルチモーダル学習のための大規模な言語モデル（LLMS）のスケーラビリティと適応性を進めました。
ただし、これらのモデルを効率的に拡張して連続したタスクに対応することは依然として困難です。
新しいタスクが到着すると、ナイーブモデルの拡張は急速なパラメーターの成長につながりますが、共有ルーティングコンポーネントを変更すると、しばしば壊滅的な忘却を引き起こし、以前に学んだ知識を損ないます。
これらの問題に対処するために、以前のタスクのリプレイデータを必要とせず、パラメーター効率と堅牢な知識保持の両方を保証するLLMSの継続的な学習フレームワークであるLLAVA-CMOEを提案します。
私たちのアプローチでは、プローブガイド付きの知識拡張メカニズムを紹介します。これは、プローブの専門家を使用して、新しい専門家がいつ、どこに追加されるべきかを動的に決定し、タスクの複雑さに合わせた適応的で最小限のパラメーター拡張を可能にします。
さらに、各タスクに専用の軽量ルーターを割り当てる確率的タスクロケーターを提示します。
推論中にタスクラベルが不明であるという実際的な問題を処理するために、VAEベースの再構成戦略を活用して、入力分布を一致させ、自動で正確な専門家の割り当てを可能にすることにより、最も適切なルーターを特定します。
この設計は、ルーティングの競合と壊滅的な忘却を緩和し、明示的なタスクラベルなしで堅牢な継続的な学習を可能にします。
8つの多様なVQAタスクをカバーするコインベンチマークでの広範な実験は、LLAVA-CMOEがコンパクトなモデルサイズで強力な継続的な学習パフォーマンスを提供し、以前の方法と比較して忘却とパラメーターオーバーヘッドを大幅に削減することを示しています。
これらの結果は、大規模な言語モデルでのパラメーター効率の高い継続的な学習に対するアプローチの有効性とスケーラビリティを示しています。
私たちのコードはまもなくオープンソーシングされます。

要約(オリジナル)

Mixture of Experts (MoE) architectures have recently advanced the scalability and adaptability of large language models (LLMs) for continual multimodal learning. However, efficiently extending these models to accommodate sequential tasks remains challenging. As new tasks arrive, naive model expansion leads to rapid parameter growth, while modifying shared routing components often causes catastrophic forgetting, undermining previously learned knowledge. To address these issues, we propose LLaVA-CMoE, a continual learning framework for LLMs that requires no replay data of previous tasks and ensures both parameter efficiency and robust knowledge retention. Our approach introduces a Probe-Guided Knowledge Extension mechanism, which uses probe experts to dynamically determine when and where new experts should be added, enabling adaptive and minimal parameter expansion tailored to task complexity. Furthermore, we present a Probabilistic Task Locator that assigns each task a dedicated, lightweight router. To handle the practical issue that task labels are unknown during inference, we leverage a VAE-based reconstruction strategy to identify the most suitable router by matching input distributions, allowing automatic and accurate expert allocation. This design mitigates routing conflicts and catastrophic forgetting, enabling robust continual learning without explicit task labels. Extensive experiments on the CoIN benchmark, covering eight diverse VQA tasks, demonstrate that LLaVA-CMoE delivers strong continual learning performance with a compact model size, significantly reducing forgetting and parameter overhead compared to prior methods. These results showcase the effectiveness and scalability of our approach for parameter-efficient continual learning in large language models. Our code will be open-sourced soon.

arxiv情報

著者	Hengyuan Zhao,Ziqin Wang,Qixin Sun,Kaiyou Song,Yilin Li,Xiaolin Hu,Qingpei Guo,Si Liu
発行日	2025-06-13 11:04:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー