Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection

要約

配信外 (OOD) の検出は、CLIP などの強力なビジョン言語モデル (VLM) を活用したゼロショットアプローチで大幅に進歩しました。
しかし、我々のパイロット研究で観察されたように、これまでの研究は主に Far-OOD のパフォーマンスを向上させることに焦点を当てていましたが、Near-OOD の有効性が損なわれる可能性がありました。
この問題に対処するために、大規模言語モデル (LLM) と VLM を革新的に利用することで、遠 OOD シナリオと近 OOD シナリオの両方でゼロショット OOD 検出パフォーマンスを強化する新しい戦略を提案します。
私たちのアプローチでは、まず LLM を利用して ID ラベルのスーパークラスとそれに対応する背景記述を生成し、その後 CLIP を使用して特徴を抽出します。
次に、スーパークラスの特徴から背景の特徴を差し引くことによって、ID データのコアの意味論的特徴を分離します。
洗練された表現により、WordNet の包括的な候補ラベルセットから OOD データに対してより適切なネガティブラベルの選択が容易になり、それによって両方のシナリオでのゼロショット OOD 検出のパフォーマンスが向上します。
さらに、新しい少数ショットプロンプトチューニングと視覚的プロンプトチューニングを導入して、提案されたフレームワークを適応させてターゲットの分布とよりよく整合させます。
実験結果は、提案されたアプローチが複数のベンチマークにわたって一貫して現在の最先端の手法を上回るパフォーマンスを示し、AUROC で最大 2.9% の改善、FPR95 で最大 12.6% の削減を実現しました。
さらに、私たちの方法は、さまざまなドメインにわたる共変量シフトに対して優れたロバスト性を示し、現実世界のシナリオにおけるその有効性をさらに強調しています。

要約(オリジナル)

Out-of-distribution (OOD) detection has seen significant advancements with zero-shot approaches by leveraging the powerful Vision-Language Models (VLMs) such as CLIP. However, prior research works have predominantly focused on enhancing Far-OOD performance, while potentially compromising Near-OOD efficacy, as observed from our pilot study. To address this issue, we propose a novel strategy to enhance zero-shot OOD detection performances for both Far-OOD and Near-OOD scenarios by innovatively harnessing Large Language Models (LLMs) and VLMs. Our approach first exploit an LLM to generate superclasses of the ID labels and their corresponding background descriptions followed by feature extraction using CLIP. We then isolate the core semantic features for ID data by subtracting background features from the superclass features. The refined representation facilitates the selection of more appropriate negative labels for OOD data from a comprehensive candidate label set of WordNet, thereby enhancing the performance of zero-shot OOD detection in both scenarios. Furthermore, we introduce novel few-shot prompt tuning and visual prompt tuning to adapt the proposed framework to better align with the target distribution. Experimental results demonstrate that the proposed approach consistently outperforms current state-of-the-art methods across multiple benchmarks, with an improvement of up to 2.9% in AUROC and a reduction of up to 12.6% in FPR95. Additionally, our method exhibits superior robustness against covariate shift across different domains, further highlighting its effectiveness in real-world scenarios.

arxiv情報

著者	Pei-Kang Lee,Jun-Cheng Chen,Ja-Ling Wu
発行日	2025-01-09 13:36:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー