Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

要約

既存のオープンソースのマルチモーダル大規模言語モデル (MLLM) は通常、事前トレーニングと教師付き微調整を含むトレーニングプロセスに従います。
ただし、これらのモデルは分布の変化に悩まされており、特に思考連鎖 (CoT) のパフォーマンスにおいて、マルチモーダルな推論が制限されます。
これに対処するために、MLLM のマルチモーダル推論機能を強化するためのプリファレンス最適化 (PO) プロセスを導入します。
具体的には、(1) データ側では、高品質で大規模なマルチモーダル推論嗜好データセットである MMPR を作成するための自動嗜好データ構築パイプラインを設計します。
(2) モデル側では、PO と MLLM の統合を検討し、マルチモーダル CoT パフォーマンスを向上させる混合優先最適化 (MPO) と呼ばれる、シンプルかつ効果的な方法を開発します。
私たちのアプローチは、特にマルチモーダル推論タスクにおいて、複数のベンチマーク全体でパフォーマンスの向上を示しています。
特に、私たちのモデルである InternVL2-8B-MPO は MathVista で 67.0 の精度を達成し、InternVL2-8B を 8.7 ポイント上回り、10 倍大きい InternVL2-76B に匹敵するパフォーマンスを達成しています。
この研究がMLLMのさらなる進歩に刺激を与えることを願っています。
コード、データ、モデルは公開するものとします。

要約(オリジナル)

Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance. To address this, we introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs. Specifically, (1) on the data side, we design an automated preference data construction pipeline to create MMPR, a high-quality, large-scale multimodal reasoning preference dataset. and (2) on the model side, we explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance. Our approach demonstrates improved performance across multiple benchmarks, particularly in multimodal reasoning tasks. Notably, our model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B. We hope this study could inspire further advancements in MLLMs. Code, data, and model shall be publicly released.

arxiv情報

著者	Weiyun Wang,Zhe Chen,Wenhai Wang,Yue Cao,Yangzhou Liu,Zhangwei Gao,Jinguo Zhu,Xizhou Zhu,Lewei Lu,Yu Qiao,Jifeng Dai
発行日	2024-11-15 18:59:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー