Improving Medical Visual Representations via Radiology Report Generation

要約

ビジョン言語の事前トレーニングは、下流のコンピュータービジョンタスクに効率的に転送される高品質のビジュアルエンコーダーを生成することが示されています。
医療ビジョン言語事前トレーニング (MVLP) には、対照学習アプローチがますます採用されていますが、生成 AI の最近の開発により、新しいモデリングの代替手段が提供されています。
この文書では、放射線医学用に最適化された CNN エンコーダトランスフォーマデコーダアーキテクチャである RadTex を紹介します。
代替の MVLP 戦略として双方向キャプションを検討し、RadTex のキャプション事前トレーニングが確立された対照的な方法と競合し、CheXpert マクロ AUC 89.4% を達成することを実証します。
さらに、RadTex の軽量テキストデコーダは、臨床的に関連性の高い放射線医学レポート (マクロ F1 スコア 0.349) を生成するだけでなく、的を絞った対話型応答も提供し、医療画像解析の進歩における双方向キャプションの有用性を強調します。

要約(オリジナル)

Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks. Contrastive learning approaches have increasingly been adopted for medical vision language pretraining (MVLP), yet recent developments in generative AI offer new modeling alternatives. This paper introduces RadTex, a CNN-encoder transformer-decoder architecture optimized for radiology. We explore bidirectional captioning as an alternative MVLP strategy and demonstrate that RadTex’s captioning pretraining is competitive with established contrastive methods, achieving a CheXpert macro-AUC of 89.4%. Additionally, RadTex’s lightweight text decoder not only generates clinically relevant radiology reports (macro-F1 score of 0.349), but also provides targeted, interactive responses, highlighting the utility of bidirectional captioning in advancing medical image analysis.

arxiv情報

著者	Keegan Quigley,Miriam Cha,Josh Barua,Geeticka Chauhan,Seth Berkowitz,Steven Horng,Polina Golland
発行日	2025-01-10 16:51:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Medical Visual Representations via Radiology Report Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー