Improving Zero-Shot Generalization for CLIP with Synthesized Prompts

要約

CLIP のような事前トレーニング済み視覚言語モデルへの関心が高まる中、最近の研究はこれらのモデルを下流タスクに適応させることに焦点を当てています。
有望な結果が得られたにもかかわらず、既存の手法のほとんどはすべてのクラスのラベル付きデータを必要としますが、ロングテールとジップの法則により、現実世界のアプリケーションではこれが当てはまらない可能性があります。
たとえば、クラスによっては、新しい概念など、ラベル付きデータが完全に欠如している場合があります。
この問題に対処するために、\textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}) と呼ばれるプラグアンドプレイの生成アプローチを提案して改善します。
既存の微調整方法。
具体的には、変分オートエンコーダに従って、合成されたプロンプトと対応するクラス名を CLIP のテキストエンコーダに入力することで視覚的特徴を再構築するジェネレータを導入します。
このようにして、残りのラベルのみのクラスの合成された特徴を簡単に取得できます。
その後、ラベル付き特徴と合成特徴を組み合わせて、既製の方法で CLIP を微調整します。
ベースから新しい一般化、クロスデータセット転移学習、および一般化されたゼロショット学習に関する広範な実験により、私たちのアプローチの優位性が実証されています。
コードは \url{https://github.com/mrflogs/SHIP} で入手できます。

要約(オリジナル)

With the growing interest in pretrained vision-language models like CLIP, recent research has focused on adapting these models to downstream tasks. Despite achieving promising results, most existing methods require labeled data for all classes, which may not hold in real-world applications due to the long tail and Zipf’s law. For example, some classes may lack labeled data entirely, such as emerging concepts. To address this problem, we propose a plug-and-play generative approach called \textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}) to improve existing fine-tuning methods. Specifically, we follow variational autoencoders to introduce a generator that reconstructs the visual features by inputting the synthesized prompts and the corresponding class names to the textual encoder of CLIP. In this manner, we easily obtain the synthesized features for the remaining label-only classes. Thereafter, we fine-tune CLIP with off-the-shelf methods by combining labeled and synthesized features. Extensive experiments on base-to-new generalization, cross-dataset transfer learning, and generalized zero-shot learning demonstrate the superiority of our approach. The code is available at \url{https://github.com/mrflogs/SHIP}.

arxiv情報

著者	Zhengbo Wang,Jian Liang,Ran He,Nan Xu,Zilei Wang,Tieniu Tan
発行日	2023-07-14 15:15:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Zero-Shot Generalization for CLIP with Synthesized Prompts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー