Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

要約

放射線医学は、その非侵襲的な診断能力により、現代医学において極めて重要な役割を果たしています。
ただし、構造化されていない医療レポートを手動で作成するには時間がかかり、エラーが発生しやすくなります。
これは臨床ワークフローに重大なボトルネックを引き起こします。
AI によって生成される放射線医学レポートは進歩していますが、詳細かつ正確なレポート生成を実現するには課題が残っています。
この研究では、コンピュータービジョンと自然言語処理を統合して包括的な放射線医学レポートを生成するマルチモーダルモデルのさまざまな組み合わせを評価しました。
画像エンコーダとして、事前トレーニング済みの Vision Transformer (ViT-B16) と SWIN Transformer を採用しました。
BART および GPT-2 モデルはテキストデコーダとして機能します。
IU-Xray データセットの胸部 X 線画像とレポートを使用して、レポート生成における SWIN Transformer-BART、SWIN Transformer-GPT-2、ViT-B16-BART、および ViT-B16-GPT-2 モデルの有用性を評価しました。
。
モデル間で最適な組み合わせを見つけることを目指しました。
SWIN-BART モデルは、4 つのモデルの中で最もパフォーマンスの高いモデルとして機能し、ROUGE、BLEU、BERTScore などのほぼすべての評価指標で顕著な結果を達成しています。

要約(オリジナル)

Radiology plays a pivotal role in modern medicine due to its non-invasive diagnostic capabilities. However, the manual generation of unstructured medical reports is time consuming and prone to errors. It creates a significant bottleneck in clinical workflows. Despite advancements in AI-generated radiology reports, challenges remain in achieving detailed and accurate report generation. In this study we have evaluated different combinations of multimodal models that integrate Computer Vision and Natural Language Processing to generate comprehensive radiology reports. We employed a pretrained Vision Transformer (ViT-B16) and a SWIN Transformer as the image encoders. The BART and GPT-2 models serve as the textual decoders. We used Chest X-ray images and reports from the IU-Xray dataset to evaluate the usability of the SWIN Transformer-BART, SWIN Transformer-GPT-2, ViT-B16-BART and ViT-B16-GPT-2 models for report generation. We aimed at finding the best combination among the models. The SWIN-BART model performs as the best-performing model among the four models achieving remarkable results in almost all the evaluation metrics like ROUGE, BLEU and BERTScore.

arxiv情報

著者	Md. Rakibul Islam,Md. Zahid Hossain,Mustofa Ahmed,Most. Sharmin Sultana Samu
発行日	2025-01-21 18:36:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー