Contrastive Language-Image Pretrained (CLIP) Models are Powerful Out-of-Distribution Detectors

要約

視覚的な分布外 (OOD) 検出のための事前トレーニング済みの特徴抽出器に関する包括的な実験的研究を提示します。
ラベルまたは画像キャプションの可用性に基づいて、インおよびアウト配布のさまざまな組み合わせを使用して、いくつかの設定を調べます。
興味深いことに、(i) 対照的な言語イメージの事前トレーニング済みモデルが、OOD 検出スコアとして最近傍特徴の類似性を使用して、最先端の教師なし分布外パフォーマンスを達成すること、(ii) 教師ありの状態の-
アート OOD 検出パフォーマンスは、配布内の微調整なしで取得できます。(iii) 自然言語の監視でトレーニングされた最高性能の 10 億規模のビジョントランスフォーマーでさえ、敵対的に操作された OOD 画像の検出に失敗します。
最後に、私たちの実験に基づいて、視覚的な異常検出のための新しいベンチマークが必要かどうかを議論します。
公開されている最大のビジョントランスフォーマーを使用して、87.6\% (9.2\% ゲイン、教師なし) および 97.4\% (1.2\% ゲイン) の AUROC を含む、報告されている 18 ドルのすべての OOD ベンチマークで最先端のパフォーマンスを達成しています。
、監督) CIFAR100 $\rightarrow$ CIFAR10 OOD 検出の挑戦的なタスク。
コードはオープンソースになります。

要約(オリジナル)

We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection. We examine several setups, based on the availability of labels or image captions and using different combinations of in- and out-distributions. Intriguingly, we find that (i) contrastive language-image pretrained models achieve state-of-the-art unsupervised out-of-distribution performance using nearest neighbors feature similarity as the OOD detection score, (ii) supervised state-of-the-art OOD detection performance can be obtained without in-distribution fine-tuning, (iii) even top-performing billion-scale vision transformers trained with natural language supervision fail at detecting adversarially manipulated OOD images. Finally, we argue whether new benchmarks for visual anomaly detection are needed based on our experiments. Using the largest publicly available vision transformer, we achieve state-of-the-art performance across all $18$ reported OOD benchmarks, including an AUROC of 87.6\% (9.2\% gain, unsupervised) and 97.4\% (1.2\% gain, supervised) for the challenging task of CIFAR100 $\rightarrow$ CIFAR10 OOD detection. The code will be open-sourced.

arxiv情報

著者	Felix Michels,Nikolas Adaloglou,Tim Kaiser,Markus Kollmann
発行日	2023-03-10 10:02:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Contrastive Language-Image Pretrained (CLIP) Models are Powerful Out-of-Distribution Detectors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー