Towards Realistic Unsupervised Fine-tuning with CLIP

要約

CLIP などのビジョン言語モデル (VLM) の出現により、下流の教師あり学習タスクへの応用に向けた重要な研究努力が促進されました。
これまでの研究の中には、CLIP の教師なし微調整を検討したものもありますが、多くの場合、グラウンドトゥルースラベルに関連付けられたクラス名の形式での事前知識に依存しています。
このペーパーでは、ラベルなしデータに未知のクラスからの分布外のサンプルが含まれている可能性があると仮定して、現実的な教師なし微調整シナリオを詳しく掘り下げます。
さらに、事前定義されたクラスラベルに関連付けられたインスタンスの認識と並行して、配布範囲外の検出機能を同時に強化することの重要性を強調します。
この問題に取り組むために、ユニバーサルエントロピー最適化 (UEO) と呼ばれる、シンプルで効率的かつ効果的な微調整アプローチを紹介します。
UEO は、サンプルレベルの信頼性を利用して、信頼性の高いインスタンスの条件付きエントロピーをほぼ最小化し、信頼性の低いインスタンスの限界エントロピーを最大化します。
テキストプロンプトの最適化とは別に、UEO には、CLIP のビジュアルブランチ内のチャネルごとのアフィン変換の最適化も組み込まれています。
15 のドメインと 4 つの異なるタイプの事前知識にわたって行われた広範な実験を通じて、UEO が一般化と分布外検出の両方の点でベースライン手法を上回ることを実証しました。

要約(オリジナル)

The emergence of vision-language models (VLMs), such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. In this paper, we delve into a realistic unsupervised fine-tuning scenario by assuming that the unlabeled data might contain out-of-distribution samples from unknown classes. Furthermore, we emphasize the importance of simultaneously enhancing out-of-distribution detection capabilities alongside the recognition of instances associated with predefined class labels. To tackle this problem, we present a simple, efficient, and effective fine-tuning approach called Universal Entropy Optimization (UEO). UEO leverages sample-level confidence to approximately minimize the conditional entropy of confident instances and maximize the marginal entropy of less confident instances. Apart from optimizing the textual prompts, UEO also incorporates optimization of channel-wise affine transformations within the visual branch of CLIP. Through extensive experiments conducted across 15 domains and 4 different types of prior knowledge, we demonstrate that UEO surpasses baseline methods in terms of both generalization and out-of-distribution detection.

arxiv情報

著者	Jian Liang,Lijun Sheng,Zhengbo Wang,Ran He,Tieniu Tan
発行日	2023-08-24 16:47:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Realistic Unsupervised Fine-tuning with CLIP

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー