Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

要約

ラージビジョン言語モデル (LVLM) の命令チューニングは、幅広い下流のビジョン言語タスクにわたるゼロショット一般化を備えた多用途モデルの開発に革命をもたらしました。
ただし、ソースや形式が異なるトレーニングタスクの多様性は、避けられないタスクの競合につながります。つまり、異なるタスクが同じモデルパラメーターのセットに対して競合し、その結果、命令に従う能力が最適化されません。
これに対処するために、我々は、命令クラスターに基づいてタスクカスタマイズされたモデルパラメーターをアクティブ化するように設計された新しいエキスパート混合 (MoE) アーキテクチャである、クラスター条件付き LoRA エキスパート混合 (MoCLE) を提案します。
新しい命令に対する MoCLE の一般化機能を向上させるために、別個のユニバーサルエキスパートがさらに組み込まれています。
10 個のゼロショットタスクに関する広範な実験により、MoCLE の有効性が実証されました。

要約(オリジナル)

Instruction tuning of the Large Vision-language Models (LVLMs) has revolutionized the development of versatile models with zero-shot generalization across a wide range of downstream vision-language tasks. However, diversity of training tasks of different sources and formats would lead to inevitable task conflicts, where different tasks conflicts for the same set of model parameters, resulting in sub-optimal instruction-following abilities. To address that, we propose the Mixture of Cluster-conditional LoRA Experts (MoCLE), a novel Mixture of Experts (MoE) architecture designed to activate the task-customized model parameters based on the instruction clusters. A separate universal expert is further incorporated to improve the generalization capabilities of MoCLE for novel instructions. Extensive experiments on 10 zero-shot tasks demonstrate the effectiveness of MoCLE.

arxiv情報

著者	Yunhao Gou,Zhili Liu,Kai Chen,Lanqing Hong,Hang Xu,Aoxue Li,Dit-Yan Yeung,James T. Kwok,Yu Zhang
発行日	2023-12-19 18:11:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー