Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation

要約

生成モデルは、特にマルチモーダルアプリケーションにおいて、人工知能（AI）に革命をもたらした。しかし、これらのモデルを医療領域に適応させることは、医療データの複雑さと臨床的正確さの厳しい必要性から、独自の課題を提起する。本研究では、マルチモーダル医療データ生成のために特別に設計されたフレームワークを紹介する。マルチビューの胸部X線画像とそれに関連する臨床レポートの生成を可能にすることで、汎用の視覚言語モデルと、医療に特化した要件とのギャップを埋める。MIMIC-CXRデータセットを活用することで、提案フレームワークは、高忠実度の画像と意味的に首尾一貫したレポートの生成において優れた性能を示す。我々の定量的評価により、FIDとBLEUスコアの点で有意な結果が得られ、生成されたデータの品質が示された。注目すべきは、我々のフレームワークが、下流の疾患分類タスクにおいて、実データと比較して同等、あるいはそれ以上の性能を達成していることであり、医学研究や診断のためのツールとしての可能性を強調している。本研究は、臨床応用における生成モデルの妥当性と有用性を高める上で、ドメイン固有の適応の重要性を強調し、合成マルチモーダル医療データ生成における将来の進歩への道を開くものである。

要約(オリジナル)

Generative models have revolutionized Artificial Intelligence (AI), particularly in multimodal applications. However, adapting these models to the medical domain poses unique challenges due to the complexity of medical data and the stringent need for clinical accuracy. In this work, we introduce a framework specifically designed for multimodal medical data generation. By enabling the generation of multi-view chest X-rays and their associated clinical report, it bridges the gap between general-purpose vision-language models and the specialized requirements of healthcare. Leveraging the MIMIC-CXR dataset, the proposed framework shows superior performance in generating high-fidelity images and semantically coherent reports. Our quantitative evaluation reveals significant results in terms of FID and BLEU scores, showcasing the quality of the generated data. Notably, our framework achieves comparable or even superior performance compared to real data on downstream disease classification tasks, underlining its potential as a tool for medical research and diagnostics. This study highlights the importance of domain-specific adaptations in enhancing the relevance and utility of generative models for clinical applications, paving the way for future advancements in synthetic multimodal medical data generation.

arxiv情報

著者	Daniele Molino,Francesco di Feola,Linlin Shen,Paolo Soda,Valerio Guarrasi
発行日	2025-05-02 08:07:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー