Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy

要約

医学的視覚的質問応答（MEDVQA）は、臨床的意思決定支援システムを開発するための有望な分野ですが、多くの場合、利用可能なデータセットによって進歩が制限されます。
これらのギャップに対処するために、胃腸（GI）内視鏡検査のための新しい大規模なデータセットであるKvasir-VQA-X1を導入します。
私たちの研究は、より深い臨床推論をテストするように設計された159,549の新しい質問回答ペアを組み込むことにより、元のKvasir-VQAを大幅に拡大します。
これらの質問を生成するために大規模な言語モデルを使用して体系的な方法を開発しました。これは、モデルの推論機能をよりよく評価するために複雑さによって層別化されます。
データセットが実際の臨床シナリオのモデルを準備するために、一般的なイメージングアーティファクトを模倣するさまざまな視覚的増強も導入しました。
データセットは、2つの主要な評価トラックをサポートするように構成されています。1つは標準のVQAパフォーマンス用、もう1つはこれらの視覚摂動に対するモデルの堅牢性をテストするためです。
より挑戦的で臨床的に関連するベンチマークを提供することにより、Kvasir-VQA-X1は、臨床環境で使用するためのより信頼性が高く効果的なマルチモーダルAIシステムの開発を加速することを目指しています。
データセットは完全にアクセスしやすく、公正なデータ原則を順守しているため、より広い研究コミュニティにとって貴重なリソースになります。
コードとデータ：https：//github.com/simula/kvasir-vqa-x1およびhttps://huggingface.co/datasets/simulamet/kvasir-vqa-x1

要約(オリジナル)

Medical Visual Question Answering (MedVQA) is a promising field for developing clinical decision support systems, yet progress is often limited by the available datasets, which can lack clinical complexity and visual diversity. To address these gaps, we introduce Kvasir-VQA-x1, a new, large-scale dataset for gastrointestinal (GI) endoscopy. Our work significantly expands upon the original Kvasir-VQA by incorporating 159,549 new question-answer pairs that are designed to test deeper clinical reasoning. We developed a systematic method using large language models to generate these questions, which are stratified by complexity to better assess a model’s inference capabilities. To ensure our dataset prepares models for real-world clinical scenarios, we have also introduced a variety of visual augmentations that mimic common imaging artifacts. The dataset is structured to support two main evaluation tracks: one for standard VQA performance and another to test model robustness against these visual perturbations. By providing a more challenging and clinically relevant benchmark, Kvasir-VQA-x1 aims to accelerate the development of more reliable and effective multimodal AI systems for use in clinical settings. The dataset is fully accessible and adheres to FAIR data principles, making it a valuable resource for the wider research community. Code and data: https://github.com/Simula/Kvasir-VQA-x1 and https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1

arxiv情報

著者	Sushant Gautam,Michael A. Riegler,Pål Halvorsen
発行日	2025-06-11 17:31:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー