CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning

要約

Contrastive Language-Image Pre-training (CLIP) の最近の進歩により、さまざまなタスクにわたる自己教師あり表現学習において顕著な成功が実証されました。
ただし、既存の CLIP のようなアプローチは、モデルとデータセットのサイズが大きいため、多くの場合、大規模な GPU リソースと長時間のトレーニング時間を必要とし、大規模なデータセットが必ずしも一般的ではない医療アプリケーションには適していません。
一方、言語モデルのプロンプトは主に画像に関連付けられたラベルから手動で導出されるため、トレーニングサンプル内の豊富な情報が見落とされる可能性があります。
効率的な大規模言語モデルと、広範な事前トレーニング済み言語および視覚モデルの長所を活用するプロンプト微調整 (CLEFT) を備えた、新しい言語と画像の対照学習方法を紹介します。
さらに、有益な臨床診断データと単純なクラスラベルの間のギャップを軽減する、コンテキストベースのプロンプトを学習するための効率的な戦略を提案します。
私たちの方法は、さまざまなベースラインと比較して、複数の胸部 X 線およびマンモグラフィーデータセットで最先端のパフォーマンスを実証します。
提案されたパラメータ効率の高いフレームワークは、現在の BERT エンコーダと比較して、トレーニング可能なモデルの合計サイズを 39% 削減し、トレーニング可能な言語モデルをわずか 4% に削減できます。

要約(オリジナル)

Recent advancements in Contrastive Language-Image Pre-training (CLIP) have demonstrated notable success in self-supervised representation learning across various tasks. However, the existing CLIP-like approaches often demand extensive GPU resources and prolonged training times due to the considerable size of the model and dataset, making them poor for medical applications, in which large datasets are not always common. Meanwhile, the language model prompts are mainly manually derived from labels tied to images, potentially overlooking the richness of information within training samples. We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT) that harnesses the strengths of the extensive pre-trained language and visual models. Furthermore, we present an efficient strategy for learning context-based prompts that mitigates the gap between informative clinical diagnostic data and simple class labels. Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets compared with various baselines. The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.

arxiv情報

著者	Yuexi Du,Brian Chang,Nicha C. Dvornek
発行日	2024-07-30 17:57:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー