Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models

要約

ビジョン言語モデル (VLM) の進歩により、特にゼロショット学習設定において、コンピュータービジョンの分野が推進されました。
これらのモデルの有効性は、その期待にもかかわらず、テスト環境でのドメインの変化により低下することがよくあります。
これに対処するために、ラベルなしのテスト入力を使用して VLM をテストデータセットに適応させるように設計された先駆的なアプローチである、Test-Time Prototype Shifting (TPS) フレームワークを導入します。
私たちの方法は、共有埋め込み空間でクラスごとのプロトタイプを変調するという概念に基づいています。
TPS は、事前トレーニングされたテキストエンコーダーで生成されたプロトタイプを事前に計算してキャッシュすることにより、その後の予測で最適化を必要としないプロトタイプの再利用を容易にするだけでなく、プロンプトエンジニアリングにおける現在の進歩とのシームレスな統合も可能にします。
テスト時に、TPS は指定されたテストサンプルのみに基づいて各プロトタイプのシフトベクトルを動的に学習し、ドメインギャップを効果的に橋渡しし、分類精度を向上させます。
私たちのフレームワークの注目すべき点は、従来のテキストプロンプト調整方法と比較して、メモリと計算の需要が大幅に削減されていることです。
自然分布の変化とデータセット間の一般化を含む 15 のデータセットにわたる広範な評価により、TPS の優れたパフォーマンスが実証され、リソース要件を削減しながら最先端の結果が達成されます。

要約(オリジナル)

Advancements in vision-language models (VLMs) have propelled the field of computer vision, particularly in the zero-shot learning setting. Despite their promise, the effectiveness of these models often diminishes due to domain shifts in test environments. To address this, we introduce the Test-Time Prototype Shifting (TPS) framework, a pioneering approach designed to adapt VLMs to test datasets using unlabeled test inputs. Our method is based on the notion of modulating per-class prototypes in the shared embedding space. By pre-computing and caching prototypes generated with the pre-trained text encoder, TPS not only facilitates optimization-free prototype reuse for subsequent predictions but also enables seamless integration with current advancements in prompt engineering. At test-time, TPS dynamically learns shift vectors for each prototype based solely on the given test sample, effectively bridging the domain gap and enhancing classification accuracy. A notable aspect of our framework is its significantly reduced memory and computational demands when compared to conventional text-prompt tuning methods. Extensive evaluations across 15 datasets involving natural distribution shifts and cross-dataset generalization demonstrate TPS’s superior performance, achieving state-of-the-art results while reducing resource requirements.

arxiv情報

著者	Elaine Sui,Xiaohan Wang,Serena Yeung-Levy
発行日	2024-03-19 17:54:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー