Structure Causal Models and LLMs Integration in Medical Visual Question Answering

要約

医療ビジュアル質問応答（Medical Visual Question Answering：MedVQA）は、医療画像に従って医療質問に回答することを目的としている。しかし、医療データは複雑であるため、観察が困難な交絡因子が存在し、画像と質問の間のバイアスは避けられない。このようなクロスモーダルなバイアスは、医学的に意味のある答えを推測することを困難にする。本研究では、MedVQAタスクのための因果推論フレームワークを提案し、画像と質問間の相対的交絡効果を効果的に除去することで、質問応答（QA）セッションの精度を確保する。我々は、視覚的要素とテキスト的要素の相互作用を表現する新しい因果グラフ構造を初めて導入し、異なる質問が視覚的特徴にどのような影響を与えるかを明示的に捉える。最適化の際には、相互情報を適用してスプリアス相関を発見し、相対的交絡効果を除去するために多変量リサンプリングフロントドア調整法を提案する。さらに、複雑な医療データを理解し、正確に回答するモデルの能力を向上させるために、複数のプロンプト形式を組み合わせたプロンプト戦略を導入する。3つのMedVQAデータセットを用いた広範な実験により、1)我々の手法がMedVQAの精度を大幅に向上させること、2)我々の手法が複雑な医療データに直面しても真の因果相関を達成することが実証された。

要約(オリジナル)

Medical Visual Question Answering (MedVQA) aims to answer medical questions according to medical images. However, the complexity of medical data leads to confounders that are difficult to observe, so bias between images and questions is inevitable. Such cross-modal bias makes it challenging to infer medically meaningful answers. In this work, we propose a causal inference framework for the MedVQA task, which effectively eliminates the relative confounding effect between the image and the question to ensure the precision of the question-answering (QA) session. We are the first to introduce a novel causal graph structure that represents the interaction between visual and textual elements, explicitly capturing how different questions influence visual features. During optimization, we apply the mutual information to discover spurious correlations and propose a multi-variable resampling front-door adjustment method to eliminate the relative confounding effect, which aims to align features based on their true causal relevance to the question-answering task. In addition, we also introduce a prompt strategy that combines multiple prompt forms to improve the model’s ability to understand complex medical data and answer accurately. Extensive experiments on three MedVQA datasets demonstrate that 1) our method significantly improves the accuracy of MedVQA, and 2) our method achieves true causal correlations in the face of complex medical data.

arxiv情報

著者	Zibo Xu,Qiang Li,Weizhi Nie,Weijie Wang,Anan Liu
発行日	2025-05-05 14:57:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Structure Causal Models and LLMs Integration in Medical Visual Question Answering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー