MoPE: Parameter-Efficient and Scalable Multimodal Fusion via Mixture of Prompt Experts

要約

プロンプトベースのマルチモーダル融合手法のパラメータ効率が実証されているにもかかわらず、その適応性と表現力が限られているため、他の調整アプローチと比較してパフォーマンスが最適以下になることがよくあります。
このペーパーでは、バニラプロンプトを分解してインスタンスレベルの機能を適応的にキャプチャすることで、これらの制限に対処します。
この分解に基づいて、プロンプトチューニングの表現力を高めるために、プロンプトエキスパートの混合 (MoPE) 手法を導入します。
MoPE は、マルチモーダルペアリングの事前設定を活用して、インスタンスごとに最も効果的なプロンプトをルーティングします。
バニラプロンプトと比較して、MoPE ベースの融合手法はより優れた表現力を示し、トレーニングデータとトレーニング可能なパラメーターの総数に合わせてより効果的にスケーリングします。
また、エキスパートルーティングの正則化用語も調査します。これにより、トレーニング中の新たなエキスパートの専門化につながり、解釈可能なソフトプロンプトへの道が開かれます。
4 つのモダリティにまたがる 6 つのマルチモーダルデータセットにわたる広範な実験により、私たちの方法が、トレーニング可能なパラメーターのわずか 0.8% しか必要とせずに、迅速な融合、微調整のパフォーマンスに匹敵、またはそれを超える最先端の結果が得られることが実証されました。
コードはhttps://github.com/songrise/MoPEでリリースされます。

要約(オリジナル)

Despite the demonstrated parameter efficiency of prompt-based multimodal fusion methods, their limited adaptivity and expressiveness often result in suboptimal performance compared to other tuning approaches. In this paper, we address these limitations by decomposing the vanilla prompts to adaptively capture instance-level features. Building upon this decomposition, we introduce the mixture of prompt experts (MoPE) technique to enhance the expressiveness of prompt tuning. MoPE leverages multimodal pairing priors to route the most effective prompt on a per-instance basis. Compared to vanilla prompting, our MoPE-based fusion method exhibits greater expressiveness, scaling more effectively with the training data and the overall number of trainable parameters. We also investigate regularization terms for expert routing, which lead to emergent expert specialization during training, paving the way for interpretable soft prompting. Extensive experiments across six multimodal datasets spanning four modalities demonstrate that our method achieves state-of-the-art results for prompt fusion, matching or even surpassing the performance of fine-tuning while requiring only 0.8% of the trainable parameters. Code will be released: https://github.com/songrise/MoPE.

arxiv情報

著者	Ruixiang Jiang,Lingbo Liu,Changwen Chen
発行日	2024-09-11 09:19:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MoPE: Parameter-Efficient and Scalable Multimodal Fusion via Mixture of Prompt Experts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー