AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

要約

大規模な事前トレーニング済みビジョン言語モデルの最近の進歩により、ゼロショットの下流タスクで顕著なパフォーマンスが実証されました。
これに基づいて、CoOp や CoCoOp などの最近の研究では、プロンプト内のコンテキストが学習可能なベクトルに置き換えられるプロンプト学習の使用が提案されており、手動で作成されたプロンプトよりも大幅な改善につながります。
ただし、目に見えないクラスのパフォーマンスの向上はまだわずかであり、この問題に取り組むために、従来のゼロショット学習手法ではデータ拡張が頻繁に使用されてきました。
私たちの実験を通じて、CoOp と CoCoOp における重要な問題を特定しました。それは、従来の画像拡張を通じて学習されたコンテキストが目に見えるクラスに偏っており、目に見えないクラスへの一般化に悪影響を及ぼしているということです。
この問題に対処するために、学習可能なプロンプトにバイアスを誘発するときに、低レベルの視覚拡張機能を高レベルのクラス情報から分離するために、敵対的トークンの埋め込みを提案します。
「学習を促進するための属性の追加」AAPL と呼ばれる新しいメカニズムを通じて、目に見えないクラスの高レベルの特徴に焦点を当て、学習可能なコンテキストをガイドしてテキストの特徴を効果的に抽出します。
11 のデータセットにわたって実験を実施しましたが、全体として、AAPL は、少数ショット学習、ゼロショット学習、クロスデータセット、およびドメイン汎化タスクにおいて、既存の手法と比較して良好なパフォーマンスを示しています。

要約(オリジナル)

Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improvement for unseen classes is still marginal, and to tackle this problem, data augmentation has been frequently used in traditional zero-shot learning techniques. Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes. To address this problem, we propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts. Through our novel mechanism called ‘Adding Attributes to Prompt Learning’, AAPL, we guide the learnable context to effectively extract text features by focusing on high-level features for unseen classes. We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.

arxiv情報

著者	Gahyeon Kim,Sohee Kim,Seokju Lee
発行日	2024-04-25 17:51:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー