Towards Efficient and General-Purpose Few-Shot Misclassification Detection for Vision-Language Models

要約

分類器による信頼できる予測は、セキュリティが高く、動的に変化する状況での展開に不可欠です。
ただし、最新のニューラルネットワークは、誤分類された予測に自信過剰を示すことが多く、エラーを検出するための信頼性推定の必要性を強調しています。
小規模データセットの既存の方法によって得られた成果にもかかわらず、それらはすべてゼロからのトレーニングを必要とし、効率的で効果的な誤分類検出（MISD）メソッドはありません。
この論文では、Vision言語モデル（VLM）を活用する方法を開き、テキスト情報を活用して、効率的で汎用の誤分類検出フレームワークを確立します。
VLMの力を活用することにより、FSMISDを構築します。FSMISDは、MISDがゼロからトレーニングを控え、したがってチューニング効率を改善するためのいくつかのショット迅速な学習フレームワークです。
誤分類検出能力を高めるために、適応性のある擬似サンプルの生成と新規の負の損失を使用して、擬似機能からカテゴリプロンプトを押し出すことにより、自信過剰の問題を軽減します。
迅速な学習方法で包括的な実験を実施し、ドメインシフトを備えたさまざまなデータセット全体で一般化能力を検証します。
重要かつ一貫した改善は、アプローチの有効性、効率性、一般化可能性を示しています。

要約(オリジナル)

Reliable prediction by classifiers is crucial for their deployment in high security and dynamically changing situations. However, modern neural networks often exhibit overconfidence for misclassified predictions, highlighting the need for confidence estimation to detect errors. Despite the achievements obtained by existing methods on small-scale datasets, they all require training from scratch and there are no efficient and effective misclassification detection (MisD) methods, hindering practical application towards large-scale and ever-changing datasets. In this paper, we pave the way to exploit vision language model (VLM) leveraging text information to establish an efficient and general-purpose misclassification detection framework. By harnessing the power of VLM, we construct FSMisD, a Few-Shot prompt learning framework for MisD to refrain from training from scratch and therefore improve tuning efficiency. To enhance misclassification detection ability, we use adaptive pseudo sample generation and a novel negative loss to mitigate the issue of overconfidence by pushing category prompts away from pseudo features. We conduct comprehensive experiments with prompt learning methods and validate the generalization ability across various datasets with domain shift. Significant and consistent improvement demonstrates the effectiveness, efficiency and generalizability of our approach.

arxiv情報

著者	Fanhu Zeng,Zhen Cheng,Fei Zhu,Xu-Yao Zhang
発行日	2025-03-26 12:31:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Efficient and General-Purpose Few-Shot Misclassification Detection for Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー