3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

要約

大規模なビジョン言語モデル（LVLM）は医学で積極的に調査されていますが、正確な診断と専門的な対話を組み合わせた遠隔医療相談を実施する能力は採用されていません。
このペーパーでは、LVLM駆動の遠隔委員会の相談をシミュレートおよび評価するためのオープンソースフレームワークである3MDBench（Medical Multimodal Multi-Agent Dialogue Benchmark）を紹介します。
3MDBenchは、4つの気質ベースの患者エージェントと、診断の精度と対話の質を共同で評価する評価者エージェントを通じて患者の変動をシミュレートします。
これには、テキストと画像ベースのデータを組み合わせた、実際の遠隔医療相互作用から描かれた34の診断にわたる3013症例が含まれています。
実験研究では、GPT-4O-MINI、LLAVA-3.2-11B-Vision-Instruct、QWEN2-VL-7B-Instructなど、一般的なLVLMの診断戦略を比較しています。
内部推論を伴うマルチモーダルの対話により、非自治的設定よりもF1スコアが6.5％向上し、コンテキスト認識の情報を求める質問の重要性を強調することが実証されています。
さらに、診断畳み込み畳み込みネットワークからLVLMのコンテキストに予測を注入すると、F1は最大20％増加します。
ソースコードは、https：//anonymous.4open.science/r/3mdbench_acl-0511で入手できます。

要約(オリジナル)

Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. In this paper, we present 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through four temperament-based Patient Agents and an Assessor Agent that jointly evaluate diagnostic accuracy and dialogue quality. It includes 3013 cases across 34 diagnoses drawn from real-world telemedicine interactions, combining textual and image-based data. The experimental study compares diagnostic strategies for popular LVLMs, including GPT-4o-mini, LLaVA-3.2-11B-Vision-Instruct, and Qwen2-VL-7B-Instruct. We demonstrate that multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings, highlighting the importance of context-aware, information-seeking questioning. Moreover, injecting predictions from a diagnostic convolutional network into the LVLM’s context boosts F1 by up to 20%. Source code is available at https://anonymous.4open.science/r/3mdbench_acl-0511.

arxiv情報

著者	Ivan Sviridov,Amina Miftakhova,Artemiy Tereshchenko,Galina Zubkova,Pavel Blinov,Andrey Savchenko
発行日	2025-06-02 16:50:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー