A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-Time Adaptation for Vision-Language Models

要約

ディープラーニングでは、分布の変化に対するモデルの堅牢性を維持することが重要です。
この研究では、CLIP とそのバリアントに特に重点を置き、テスト時にビジョン言語基盤モデルを適応させるための幅広い可能性を探ります。
この研究では、プロンプトベースの手法と既存のテスト時間適応手法を系統的に検証し、現実世界の多様なシナリオにおける分布シフトの下での堅牢性を向上させることを目的としています。
具体的には、調査では、手作りのプロンプト、プロンプトアンサンブル、プロンプト学習テクニックなど、さまざまなプロンプトエンジニアリング戦略が対象になります。
さらに、テキスト空間のみのアンサンブルと比較して平均パフォーマンスを大幅に向上させるビジョン、テキスト空間のアンサンブルを導入します。
オンラインのテスト時間適応は、分布シフト下でのパフォーマンス低下を軽減するのに効果的であることが示されているため、この研究では、もともと視覚のみの分類モデル用に設計された既存のテスト時間適応手法の有効性を評価するために範囲を拡大しました。
この研究では、複数のデータセットと多様なモデルアーキテクチャにわたって行われた広範な実験評価を通じて、これらの適応戦略の有効性が実証されています。
コードはhttps://github.com/mariodoebler/test-time-adaptationから入手できます。

要約(オリジナル)

In deep learning, maintaining model robustness against distribution shifts is critical. This work explores a broad range of possibilities to adapt vision-language foundation models at test-time, with a particular emphasis on CLIP and its variants. The study systematically examines prompt-based techniques and existing test-time adaptation methods, aiming to improve the robustness under distribution shift in diverse real-world scenarios. Specifically, the investigation covers various prompt engineering strategies, including handcrafted prompts, prompt ensembles, and prompt learning techniques. Additionally, we introduce a vision-text-space ensemble that substantially enhances average performance compared to text-space-only ensembles. Since online test-time adaptation has shown to be effective to mitigate performance drops under distribution shift, the study extends its scope to evaluate the effectiveness of existing test-time adaptation methods that were originally designed for vision-only classification models. Through extensive experimental evaluations conducted across multiple datasets and diverse model architectures, the research demonstrates the effectiveness of these adaptation strategies. Code is available at: https://github.com/mariodoebler/test-time-adaptation

arxiv情報

著者	Mario Döbler,Robert A. Marsden,Tobias Raichle,Bin Yang
発行日	2024-09-09 17:33:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-Time Adaptation for Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー