Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective

要約

近年の大規模言語モデル（LLM）の進歩により、マルチモーダルLLM（MLLM）の開発が容易になった。その素晴らしい能力にもかかわらず、MLLMはしばしば単一モダルのバイアス（言語バイアスや視覚バイアスなど）に過度に依存し、複雑なマルチモーダル課題において誤った解答を導くという問題を抱えている。この問題を調査するために、我々は視覚的質問応答（VQA）問題におけるバイアスを解釈する因果的枠組みを提案する。本フレームワークでは、VQA問題におけるMLLMの予測を明らかにするために因果グラフを考案し、詳細な因果分析を通じてバイアスの因果効果を評価する。因果グラフを動機として、12,000個のVQAインスタンスからなる新しいMOREデータセットを導入する。このデータセットは、MLLMの能力に挑戦するように設計されており、マルチホップ推論と単一峰性バイアスの克服を必要とする。さらに、一峰性のバイアスを緩和し、MLLMの推論能力を向上させるために、アクセス制限のあるMLLMのためのDecompose-Verify-Answer(DeVA)フレームワークと、オープンソースのMLLMの微調整による改良を含む、2つの戦略を提案する。広範な量的・質的実験により、将来の研究のための貴重な洞察を提供する。我々のプロジェクトページはhttps://opencausalab.github.io/MORE。

要約(オリジナル)

Recent advancements in Large Language Models (LLMs) have facilitated the development of Multimodal LLMs (MLLMs). Despite their impressive capabilities, MLLMs often suffer from an over-reliance on unimodal biases (e.g., language bias and vision bias), leading to incorrect answers in complex multimodal tasks. To investigate this issue, we propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems. Within our framework, we devise a causal graph to elucidate the predictions of MLLMs on VQA problems, and assess the causal effect of biases through an in-depth causal analysis. Motivated by the causal graph, we introduce a novel MORE dataset, consisting of 12,000 VQA instances. This dataset is designed to challenge MLLMs’ abilities, necessitating multi-hop reasoning and the surmounting of unimodal biases. Furthermore, we propose two strategies to mitigate unimodal biases and enhance MLLMs’ reasoning capabilities, including a Decompose-Verify-Answer (DeVA) framework for limited-access MLLMs and the refinement of open-source MLLMs through fine-tuning. Extensive quantitative and qualitative experiments offer valuable insights for future research. Our project page is at https://opencausalab.github.io/MORE.

arxiv情報

著者	Meiqi Chen,Yixin Cao,Yan Zhang,Chaochao Lu
発行日	2024-04-03 17:18:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー