Vocabulary-free few-shot learning for Vision-Language Models

要約

ビジョン言語モデル（VLM）の少数のショット適応の最近の進歩により、いくつかのラベルの付いた例を使用して、タスク全体で一般化する能力が大幅に拡大しました。
ただし、既存のアプローチは、主に、慎重に設計されたタスク固有のプロンプトを活用することにより、これらのモデルの強力なゼロショット事前に基づいています。
事前に定義されたクラス名への依存性は、特に正確なクラス名が利用できないか、指定が困難なシナリオでは、適用性を制限できます。
この制限に対処するために、VLMSの語彙を含まない少数の学習を紹介します。これは、ターゲットクラスインスタンス、つまり画像 – が利用可能であるが、対応する名前が利用できない設定です。
類似性マッピング（SIM）を提案します。これは、一連の一般的なプロンプト（テキストまたはビジュアル）を使用して類似性スコアのみに基づいてターゲットインスタンスを分類し、慎重に手作りのプロンプトの必要性を排除する標的インスタンスを分類するシンプルで効果的なベースラインを提案します。
概念的には簡単ですが、SIMは強力なパフォーマンスを示し、高い計算効率で動作し（通常、マッピングの学習には1秒未満かかります）、ターゲットクラスを一般的なプロンプトにリンクすることで解釈可能性を提供します。
私たちのアプローチは、語彙を含まない少数の学習における将来の研究のための重要なベースラインとして役立つと考えています。
コードはhttps://github.com/maxzanella/vocabulary-free-fslで入手できます。

要約(オリジナル)

Recent advances in few-shot adaptation for Vision-Language Models (VLMs) have greatly expanded their ability to generalize across tasks using only a few labeled examples. However, existing approaches primarily build upon the strong zero-shot priors of these models by leveraging carefully designed, task-specific prompts. This dependence on predefined class names can restrict their applicability, especially in scenarios where exact class names are unavailable or difficult to specify. To address this limitation, we introduce vocabulary-free few-shot learning for VLMs, a setting where target class instances – that is, images – are available but their corresponding names are not. We propose Similarity Mapping (SiM), a simple yet effective baseline that classifies target instances solely based on similarity scores with a set of generic prompts (textual or visual), eliminating the need for carefully handcrafted prompts. Although conceptually straightforward, SiM demonstrates strong performance, operates with high computational efficiency (learning the mapping typically takes less than one second), and provides interpretability by linking target classes to generic prompts. We believe that our approach could serve as an important baseline for future research in vocabulary-free few-shot learning. Code is available at https://github.com/MaxZanella/vocabulary-free-FSL.

arxiv情報

著者	Maxime Zanella,Clément Fuchs,Ismail Ben Ayed,Christophe De Vleeschouwer
発行日	2025-06-04 14:32:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vocabulary-free few-shot learning for Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー