Glider: Global and Local Instruction-Driven Expert Router

要約

パフォーマンスの高い事前トレーニング済みモデルが利用できるようになったことで、特定のドメインに特化した、細かく調整されたエキスパートモデルが急増しました。
これにより、エキスパートモジュールを使用してパフォーマンスや汎用性が向上した集合システムを作成することを目的とした、強力で適応性のあるルーティングベースの「Model MoErging」メソッドの作成が可能になりました。
ただし、既存の MoErging 手法は、保持されているタスクのパフォーマンスを犠牲にして、目に見えないタスクへの一般化を優先することが多く、現実の展開シナリオでの実際的な適用性が制限されます。
現在のトークンレベルのルーティングメカニズムは、入力タスクのグローバルな意味論的コンテキストを無視していることがわかります。
このトークン単位の独立性により、ルーティングの決定にタスクの意味論的特性が組み込まれていないため、保留されたタスクに対する効果的な専門家の選択が妨げられます。
これに対処するために、セマンティックグローバルルーターと学習済みローカルルーターを含むマルチスケールルーティングメカニズムを統合する、グローバルおよびローカルの命令駆動エキスパートルーター (GLIDER) を提案します。
グローバルルーターは、セマンティック関連のコンテキストに対する LLM の高度な推論機能を活用して、専門家の選択を強化します。
入力クエリと LLM が与えられると、ルーターは、すべての層にわたって最も関連性の高い専門家の検索をガイドするセマンティックタスク命令を生成します。
このグローバルガイダンスは、各モジュール内でのトークンレベルのルーティング決定を容易にするローカルルーターによって補完され、目に見えないタスクのより詳細な制御とパフォーマンスの向上が可能になります。
T0 および FLAN タスクに T5 ベースのモデルを使用した実験では、GLIDER がホールドアウトタスクの強力な一般化を維持しながら、大幅に向上したホールドインパフォーマンスを達成することを実証しました。
また、GLIDER のコンポーネントをより深く掘り下げるアブレーション実験も行っています。
私たちの実験は、MoErging メソッドの LLM 駆動のセマンティック推論を活用するマルチスケールルーティングの重要性を強調しています。

要約(オリジナル)

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to particular domains. This has enabled the creation of powerful and adaptive routing-based ‘Model MoErging’ methods with the goal of using expert modules to create an aggregate system with improved performance or generalization. However, existing MoErging methods often prioritize generalization to unseen tasks at the expense of performance on held-in tasks, which limits its practical applicability in real-world deployment scenarios. We observe that current token-level routing mechanisms neglect the global semantic context of the input task. This token-wise independence hinders effective expert selection for held-in tasks, as routing decisions fail to incorporate the semantic properties of the task. To address this, we propose, Global and Local Instruction Driven Expert Router (GLIDER) that integrates a multi-scale routing mechanism, encompassing a semantic global router and a learned local router. The global router leverages LLM’s advanced reasoning capabilities for semantic-related contexts to enhance expert selection. Given the input query and LLM, the router generates semantic task instructions that guide the retrieval of the most relevant experts across all layers. This global guidance is complemented by a local router that facilitates token-level routing decisions within each module, enabling finer control and enhanced performance on unseen tasks. Our experiments using T5-based models for T0 and FLAN tasks demonstrate that GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks. We also perform ablations experiments to dive deeper into the components of GLIDER. Our experiments highlight the importance of our multi-scale routing that leverages LLM-driven semantic reasoning for MoErging methods.

arxiv情報

著者	Pingzhi Li,Prateek Yadav,Jaehong Yoon,Jie Peng,Yi-Lin Sung,Mohit Bansal,Tianlong Chen
発行日	2024-10-09 17:59:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Glider: Global and Local Instruction-Driven Expert Router

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー