C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion

要約

深層学習では、ラベル付きデータを必要とせずにモデルを微調整する方法として、テスト時適応が注目を集めています。
その主な例は、CLIP などの大規模ビジョン言語モデル用に最近提案されたテスト時プロンプトチューニングです。
残念ながら、これらのプロンプトは主に精度を向上させるために開発されており、予測の不確実性を定量化するための重要な側面であるキャリブレーションの重要性が見落とされています。
ただし、従来のキャリブレーション方法は大量のラベル付きデータに依存しているため、テスト時のシナリオでは実用的ではありません。
この目的を達成するために、このペーパーでは、CLIP の固有の特性を活用して、テスト時のプロンプトチューニング中のキャリブレーションについて検討します。
一連の観察を通じて、プロンプトの選択が CLIP のキャリブレーションに大きく影響し、より高いテキスト特徴の分散につながるプロンプトにより、より適切にキャリブレーションされた予測が得られることがわかりました。
平均テキスト特徴分散 (ATFD) を導入して、キャリブレーション誤差との関係を確立し、強化されたキャリブレーションを使用してテスト中にプロンプトを最適化するための新しい方法であるキャリブレーション済みテスト時プロンプトチューニング (C-TPT) を紹介します。
さまざまな CLIP アーキテクチャとデータセットでの広範な実験を通じて、C-TPT がラベル付きデータを必要とせずにテスト時のプロンプトチューニングのキャリブレーションを効果的に改善できることを示しました。
コードは https://github.com/hee-suk-yoon/C-TPT で公開されています。

要約(オリジナル)

In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data. A prime exemplification is the recently proposed test-time prompt tuning for large-scale vision-language models such as CLIP. Unfortunately, these prompts have been mainly developed to improve accuracy, overlooking the importance of calibration, which is a crucial aspect for quantifying prediction uncertainty. However, traditional calibration methods rely on substantial amounts of labeled data, making them impractical for test-time scenarios. To this end, this paper explores calibration during test-time prompt tuning by leveraging the inherent properties of CLIP. Through a series of observations, we find that the prompt choice significantly affects the calibration in CLIP, where the prompts leading to higher text feature dispersion result in better-calibrated predictions. Introducing the Average Text Feature Dispersion (ATFD), we establish its relationship with calibration error and present a novel method, Calibrated Test-time Prompt Tuning (C-TPT), for optimizing prompts during test-time with enhanced calibration. Through extensive experiments on different CLIP architectures and datasets, we show that C-TPT can effectively improve the calibration of test-time prompt tuning without needing labeled data. The code is publicly accessible at https://github.com/hee-suk-yoon/C-TPT.

arxiv情報

著者	Hee Suk Yoon,Eunseop Yoon,Joshua Tian Jin Tee,Mark Hasegawa-Johnson,Yingzhen Li,Chang D. Yoo
発行日	2024-03-31 13:36:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー