Context-Aware Robust Fine-Tuning

要約

Contrastive Language-Image Pre-trained (CLIP) モデルは、画像とプロンプト文「a [CONTEXT] of [CLASS]」との類似性を使用して、「[CLASS]」に属する画像を分類するゼロショット機能を備えています。
「[CONTEXT]」の完全なテキストキューに基づいて、CLIP モデルはさまざまなコンテキストを認識します。
背景、スタイル、視点、そして幅広い分布の変化に対して前例のない堅牢性を示します。
ただし、最近の研究では、CLIP モデルをさらに微調整すると精度が向上しますが、下流のタスクの堅牢性が犠牲になります。
微調整が事前にトレーニングされた CLIP 機能のコンテキスト認識能力を損なうことを示すために、実証的な調査を行います。
この問題を解決するために、Context-Aware Robust Fine-tuning (CAR-FT) を提案します。
CAR-FT は、微調整中にモデルを正則化して、コンテキスト情報を取得します。
具体的には、ゼロショットプロンプトウェイトを使用して、画像に含まれるコンテキスト分布を取得します。
CAR-FT は、元の/微調整された CLIP モデルによって引き起こされるコンテキスト分布間のカルバックライブラーダイバージェンス (KLD) を最小化することにより、CLIP のコンテキスト認識能力を下流のタスクに継承させ、より高い分布内分布 (ID) と
分布外 (OOD) の精度。
実験結果は、CAR-FT が ImageNet の 5 つの OOD テストデータセットで優れたロバスト性を達成し、一方で 9 つのダウンストリームタスクで精度が向上することを示しています。
さらに、CAR-FT は以前の Domain Generalization (DG) メソッドを上回り、DomainBed ベンチマークで平均 78.5% の精度を達成し、新しい最先端技術を構築しています。

要約(オリジナル)

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to ‘[CLASS]’ by using similarity between the image and the prompt sentence ‘a [CONTEXT] of [CLASS]’. Based on exhaustive text cues in ‘[CONTEXT]’, CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback-Leibler Divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher In-Distribution (ID) and Out-Of-Distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous Domain Generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

arxiv情報

著者	Xiaofeng Mao,Yuefeng Chen,Xiaojun Jia,Rong Zhang,Hui Xue,Zhao Li
発行日	2022-11-29 13:07:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Context-Aware Robust Fine-Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー