Visual Prompt Engineering for Medical Vision Language Models in Radiology

要約

放射線医学における医療画像の分類は、特に目に見えない病理を一般化する際に、重大な課題に直面しています。
対照的に、CLIP は、マルチモーダル学習を活用してゼロショット分類のパフォーマンスを向上させる、有望なソリューションを提供します。
ただし、医療分野では、病変が小さい場合があり、埋め込み空間で十分に表現できない場合があります。
したがって、この論文では、放射線医学における視覚言語モデル (VLM) の機能を強化するためのビジュアルプロンプトエンジニアリングの可能性を探ります。
広範な生物医学画像とテキストのペアでトレーニングされた BiomedCLIP を活用して、モデルの注意を重要な領域に導くために、放射線画像内に視覚マーカーを直接埋め込むことの影響を調査します。
肺結節の悪性腫瘍分類に焦点を当てた JSRT データセットの評価では、矢印、円、等高線 $\unicode{x2013}$ などの視覚的プロンプト $\unicode{x2013}$ を組み込むことで、AUROC、AUPRC、
F1 スコアと精度。
さらに、この研究はアテンションマップを提供し、モデルの解釈可能性が向上し、臨床的に関連する領域に焦点を当てていることを示しています。
これらの発見は、医用画像解析における VLM のパフォーマンスを向上させるための直接的かつ強力なアプローチとして、ビジュアルプロンプトエンジニアリングの有効性を強調しています。

要約(オリジナル)

Medical image classification in radiology faces significant challenges, particularly in generalizing to unseen pathologies. In contrast, CLIP offers a promising solution by leveraging multimodal learning to improve zero-shot classification performance. However, in the medical domain, lesions can be small and might not be well represented in the embedding space. Therefore, in this paper, we explore the potential of visual prompt engineering to enhance the capabilities of Vision Language Models (VLMs) in radiology. Leveraging BiomedCLIP, trained on extensive biomedical image-text pairs, we investigate the impact of embedding visual markers directly within radiological images to guide the model’s attention to critical regions. Our evaluation on the JSRT dataset, focusing on lung nodule malignancy classification, demonstrates that incorporating visual prompts $\unicode{x2013}$ such as arrows, circles, and contours $\unicode{x2013}$ significantly improves classification metrics including AUROC, AUPRC, F1 score, and accuracy. Moreover, the study provides attention maps, showcasing enhanced model interpretability and focus on clinically relevant areas. These findings underscore the efficacy of visual prompt engineering as a straightforward yet powerful approach to advance VLM performance in medical image analysis.

arxiv情報

著者	Stefan Denner,Markus Bujotzek,Dimitrios Bounias,David Zimmerer,Raphael Stock,Paul F. Jäger,Klaus Maier-Hein
発行日	2024-08-28 13:53:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Visual Prompt Engineering for Medical Vision Language Models in Radiology

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー