Visual Prompt Engineering for Vision Language Models in Radiology

要約

医療画像分類は臨床的意思決定において重要な役割を果たしますが、ほとんどのモデルは定義されたクラスの固定セットに制約されており、適応性を新しい条件に制限しています。
対照的な言語イメージの事前トレーニング（CLIP）は、マルチモーダルの大規模な事前トレーニングを通じてゼロショット分類を可能にすることにより、有望なソリューションを提供します。
ただし、CLIPはグローバルな画像コンテンツを効果的にキャプチャしますが、放射線学では、解釈性と診断の精度の両方を強化するために、特定の病理領域により局所的な焦点が必要です。
これに対処するために、視覚キューをゼロショット分類に組み込む可能性を探り、矢印、境界ボックス、サークルなどの視覚マーカー$ \ unicode {x2013} $を埋め込みます$ \ unicode {x2013} $は放射線画像に直接導きます。
モデルの注意。
4つのパブリックチェストX線データセットで評価すると、視覚マーカーはAUROCを最大0.185改善し、分類パフォーマンスを向上させる効果を強調することを実証します。
さらに、注意マップ分析により、視覚的なキューがモデルが臨床的に関連する領域に焦点を合わせ、より解釈可能な予測につながるのに役立つことが確認されています。
さらなる調査をサポートするために、パブリックデータセットを使用し、コードと前処理パイプラインをリリースし、医療イメージングにおけるローカライズされた分類に関する将来の作業の基準点を提供します。

要約(オリジナル)

Medical image classification plays a crucial role in clinical decision-making, yet most models are constrained to a fixed set of predefined classes, limiting their adaptability to new conditions. Contrastive Language-Image Pretraining (CLIP) offers a promising solution by enabling zero-shot classification through multimodal large-scale pretraining. However, while CLIP effectively captures global image content, radiology requires a more localized focus on specific pathology regions to enhance both interpretability and diagnostic accuracy. To address this, we explore the potential of incorporating visual cues into zero-shot classification, embedding visual markers $\unicode{x2013}$ such as arrows, bounding boxes, and circles $\unicode{x2013}$ directly into radiological images to guide model attention. Evaluating across four public chest X-ray datasets, we demonstrate that visual markers improve AUROC by up to 0.185, highlighting their effectiveness in enhancing classification performance. Furthermore, attention map analysis confirms that visual cues help models focus on clinically relevant areas, leading to more interpretable predictions. To support further research, we use public datasets and will release our code and preprocessing pipeline, providing a reference point for future work on localized classification in medical imaging.

arxiv情報

著者	Stefan Denner,Markus Bujotzek,Dimitrios Bounias,David Zimmerer,Raphael Stock,Klaus Maier-Hein
発行日	2025-02-10 15:12:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Visual Prompt Engineering for Vision Language Models in Radiology

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー