Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis

要約

自己監視技術を使用して膨大な量のデータで訓練された基礎モデルは、医学における人工知能（AI）アプリケーションを進めるための有望なフロンティアとして浮上しています。
この研究では、3つの異なる視覚言語基礎モデル（Rad-Dino、Chexagent、およびBiomedClip）を評価して、放射線科タスクの微細なイメージング機能をキャプチャする能力を評価します。
モデルは、胸部レントゲン写真の気胸および心臓腫瘍の分類、セグメンテーション、および回帰タスク全体で評価されました。
自己監視されたRad-Dinoは一貫してセグメンテーションタスクに優れていましたが、テキスト補助Chexagentは優れた分類パフォーマンスを実証しました。
BiomedClipは、タスク全体で一貫性のないパフォーマンスを示しました。
グローバルとローカルの機能を統合するカスタムセグメンテーションモデルは、特に気胸セグメンテーションに挑戦するために、すべての基礎モデルのパフォーマンスを大幅に改善しました。
調査結果は、トレーニング前の方法論が特定のダウンストリームタスクのモデルパフォーマンスに大きく影響することを強調しています。
微調整されたセグメンテーションタスクの場合、テキストの監督なしで訓練されたモデルはより良く機能しましたが、テキスト補助モデルは分類と解釈可能性の利点を提供しました。
これらの洞察は、放射線学における特定の臨床応用に基づいて、基礎モデルを選択するためのガイダンスを提供します。

要約(オリジナル)

Foundation models, trained on vast amounts of data using self-supervised techniques, have emerged as a promising frontier for advancing artificial intelligence (AI) applications in medicine. This study evaluates three different vision-language foundation models (RAD-DINO, CheXagent, and BiomedCLIP) on their ability to capture fine-grained imaging features for radiology tasks. The models were assessed across classification, segmentation, and regression tasks for pneumothorax and cardiomegaly on chest radiographs. Self-supervised RAD-DINO consistently excelled in segmentation tasks, while text-supervised CheXagent demonstrated superior classification performance. BiomedCLIP showed inconsistent performance across tasks. A custom segmentation model that integrates global and local features substantially improved performance for all foundation models, particularly for challenging pneumothorax segmentation. The findings highlight that pre-training methodology significantly influences model performance on specific downstream tasks. For fine-grained segmentation tasks, models trained without text supervision performed better, while text-supervised models offered advantages in classification and interpretability. These insights provide guidance for selecting foundation models based on specific clinical applications in radiology.

arxiv情報

著者	Frank Li,Hari Trivedi,Bardia Khosravi,Theo Dapamede,Mohammadreza Chavoshi,Abdulhameed Dere,Rohan Satya Isaac,Aawez Mansuri,Janice Newsome,Saptarshi Purkayastha,Judy Gichoya
発行日	2025-04-22 17:20:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー