An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

要約

マルチモーダル命令データセットを使用して微調整されたマルチモーダル大規模言語モデル (MLLM) は、マルチモーダルタスクにおいて優れた機能を実証しています。
ただし、MLLM には通常数十億のパラメーターが含まれているため、MLLM のすべてのパラメーターを微調整することは困難になっています。
この問題に対処するために、MLLM 向けのパラメーター効率の良い微調整 (PEFT) 手法を研究します。
私たちは、限られた数のパラメーターのみがトレーニングされるシナリオで MLLM のパフォーマンスを向上させる効果的な方法を特定することを目指しています。
このペーパーでは、オープンソース MLLM の LLM コンポーネントを微調整するために、4 つの一般的な PEFT 手法を使用した実証研究を実施します。
さまざまなモデルに対する PEFT 手法の影響、PEFT モジュールのパラメーターと位置、微調整データのサイズ、PEFT 手法に基づくモデルの安定性、MLLM の一般化、幻覚など、さまざまな側面を含む包括的な分析を示します。
私たちは、目に見えないデータセットと目に見えるデータセットという 2 つの異なるカテゴリの 7 つのデータセットに対して 4 つの PEFT 手法を評価しました。
すべての実験を通じて、アダプターが最もパフォーマンスの高い PEFT メソッドであることを示しています。
同時に、コネクタ層を微調整することで、ほとんどの MLLM のパフォーマンスが向上します。
コードとデータは https://github.com/alenai97/PEFT-MLLM.git で入手できます。

要約(オリジナル)

Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for enhancing the performance of MLLMs in scenarios where only a limited number of parameters are trained. This paper conducts empirical studies using four popular PEFT methods to fine-tune the LLM component of open-source MLLMs. We present a comprehensive analysis that encompasses various aspects, including the impact of PEFT methods on various models, parameters and location of the PEFT module, size of fine-tuning data, model stability based on PEFT methods, MLLM’s generalization, and hallucination. We evaluated four PEFT methods on seven datasets from two different categories: unseen and seen datasets. Across all experiments, we show that the adapter is the best-performing PEFT method. At the same time, fine-tuning the connector layers leads to improved performance in most MLLMs. Code and data are available at https://github.com/alenai97/PEFT-MLLM.git.

arxiv情報

著者	Xiongtao Zhou,Jie He,Yuhua Ke,Guangyao Zhu,Víctor Gutiérrez-Basulto,Jeff Z. Pan
発行日	2024-06-07 17:58:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー