MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models

要約

放射線科レポートの生成、特に胸部 X 線 (CXR) の生成に AI を適用することへの関心が高まっています。
この論文では、セグメンテーションマスクを介してピクセルレベルの情報を組み込むことで、放射線レポート作成のためのマルチモーダル大規模言語モデル (MLLM) のきめ細かい画像解釈が向上できるかどうかを調査します。
放射線医学レポートを生成するために CXR と並行してセマンティックセグメンテーションマスクを利用するように設計されたセグメンテーション対応 MLLM フレームワークである MAIRA-Seg を紹介します。
私たちはエキスパートセグメンテーションモデルをトレーニングして、CXR の放射線特有の構造のマスク擬似ラベルを取得します。
その後、CXR に特化したレポート生成モデルである MAIRA のアーキテクチャに基づいて、これらのマスク擬似ラベルを活用するトレーニング可能なセグメンテーショントークン抽出機能を統合し、マスク対応プロンプトを使用してドラフト放射線医学レポートを生成します。
公開されている MIMIC-CXR データセットに対する私たちの実験では、MAIRA-Seg が非セグメンテーションベースラインを上回るパフォーマンスを示しています。
また、MAIRA を使用したマークプロンプトのセットも調査し、MAIRA-Seg が一貫して同等またはそれ以上のパフォーマンスを示していることを発見しました。
この結果は、セグメンテーションマスクの使用により MLLM の微妙な推論が強化され、より良い臨床転帰に貢献する可能性があることを確認しています。

要約(オリジナル)

There is growing interest in applying AI to radiology report generation, particularly for chest X-rays (CXRs). This paper investigates whether incorporating pixel-level information through segmentation masks can improve fine-grained image interpretation of multimodal large language models (MLLMs) for radiology report generation. We introduce MAIRA-Seg, a segmentation-aware MLLM framework designed to utilize semantic segmentation masks alongside CXRs for generating radiology reports. We train expert segmentation models to obtain mask pseudolabels for radiology-specific structures in CXRs. Subsequently, building on the architectures of MAIRA, a CXR-specialised model for report generation, we integrate a trainable segmentation tokens extractor that leverages these mask pseudolabels, and employ mask-aware prompting to generate draft radiology reports. Our experiments on the publicly available MIMIC-CXR dataset show that MAIRA-Seg outperforms non-segmentation baselines. We also investigate set-of-marks prompting with MAIRA and find that MAIRA-Seg consistently demonstrates comparable or superior performance. The results confirm that using segmentation masks enhances the nuanced reasoning of MLLMs, potentially contributing to better clinical outcomes.

arxiv情報

著者	Harshita Sharma,Valentina Salvatelli,Shaury Srivastav,Kenza Bouzid,Shruthi Bannur,Daniel C. Castro,Maximilian Ilse,Sam Bond-Taylor,Mercy Prasanna Ranjit,Fabian Falck,Fernando Pérez-García,Anton Schwaighofer,Hannah Richardson,Maria Teodora Wetscherek,Stephanie L. Hyland,Javier Alvarez-Valle
発行日	2024-11-18 08:13:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー