Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models

要約

迅速な学習は、いくつかのショットシナリオで、事前に訓練されたビジョン言語モデル（VLM）を生物医学的画像分類タスクに適応させるための最も効果的なパラダイムの1つです。
ただし、現在の迅速な学習方法のほとんどは、テキストプロンプトのみを使用し、生物医学画像の特定の構造（複雑な解剖学的構造や微妙な病理学的特徴など）を無視しました。
この作業では、知識が強化されたデュアルモダリティプロンプトチューニング手法であるBiomed-DPTを提案します。
テキストプロンプトの設計において、Biomed-DPTは、テンプレート駆動型の臨床プロンプトと大手言語モデル（LLM）駆動型ドメインに適応したプロンプトを含む二重プロンプトを構築し、知識蒸留技術を通じてドメイン適応プロンプトから臨床知識を抽出します。
ビジョンプロンプトの設計において、Biomed-DPTは、非診断領域への焦点と非批判的な病理学的特徴の認識が回避されるように、注意の再重視を活用するためのソフトプロンプトとしてゼロベクトルを導入します。
Biomed-DPTは、9つのモダリティと10臓器をカバーする11の生物医学的画像データセットにわたって66.14 \％の平均分類精度を達成し、基本クラスで78.06 \％、新しいクラスで75.97％に達し、コンテキスト最適化（Coop）方法を6.20 \％、3.88 \％、および8.04 \％で表します。
私たちのコードは、\ underline {https://github.com/kanyooo/biomed-dpt}で入手できます。

要約(オリジナル)

Prompt learning is one of the most effective paradigms for adapting pre-trained vision-language models (VLMs) to the biomedical image classification tasks in few shot scenarios. However, most of the current prompt learning methods only used the text prompts and ignored the particular structures (such as the complex anatomical structures and subtle pathological features) in the biomedical images. In this work, we propose Biomed-DPT, a knowledge-enhanced dual modality prompt tuning technique. In designing the text prompt, Biomed-DPT constructs a dual prompt including the template-driven clinical prompts and the large language model (LLM)-driven domain-adapted prompts, then extracts the clinical knowledge from the domain-adapted prompts through the knowledge distillation technique. In designing the vision prompt, Biomed-DPT introduces the zero vector as a soft prompt to leverage attention re-weighting so that the focus on non-diagnostic regions and the recognition of non-critical pathological features are avoided. Biomed-DPT achieves an average classification accuracy of 66.14\% across 11 biomedical image datasets covering 9 modalities and 10 organs, with performance reaching 78.06\% in base classes and 75.97\% in novel classes, surpassing the Context Optimization (CoOp) method by 6.20\%, 3.78\%, and 8.04\%, respectively. Our code are available at \underline{https://github.com/Kanyooo/Biomed-DPT}.

arxiv情報

著者	Wei Peng,Kang Liu,Jianchen Hu,Meng Zhang
発行日	2025-05-08 12:37:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー