Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

要約

インターネットから収集したノイズの多い画像から配布中 (ID) 画像を抽出することは、データセットを構築するための重要な前処理であり、従来は手動で行われてきました。
深層学習技術を使用してこの前処理を自動化すると、2 つの重要な課題が生じます。
まず、ID データのトレーニングを行わずに、ID クラスの名前のみを使用して画像を収集する必要があります。
次に、COCO が作成された理由がわかるように、堅牢な認識エンジンを作成するには、ID オブジェクトだけでなく、ID と配布外 (OOD) オブジェクトの両方を含む画像を ID 画像として識別することが重要です。
この論文では、ゼロショット分布内 (ID) 検出と呼ばれる新しい問題設定を提案します。この検出では、ID オブジェクトを含む画像を (OOD オブジェクトが含まれている場合でも) ID 画像として識別し、ID オブジェクトを含まない画像を OOD 画像として識別します。
あらゆるトレーニング。
この問題を解決するために、CLIP の強力なゼロショット機能を活用し、CLIP 機能のグローバルとローカルの両方のビジュアルテキストの配置に基づいた、シンプルで効果的なアプローチであるグローバル-ローカル最大コンセプトマッチング (GL-MCM) を提案します。
広範な実験により、GL-MCM がマルチオブジェクトデータセットと単一オブジェクト ImageNet ベンチマークの両方で比較方法よりも優れたパフォーマンスを発揮することが実証されました。
コードは https://github.com/AtsuMiyai/GL-MCM から入手できます。

要約(オリジナル)

Extracting in-distribution (ID) images from noisy images scraped from the Internet is an important preprocessing for constructing datasets, which has traditionally been done manually. Automating this preprocessing with deep learning techniques presents two key challenges. First, images should be collected using only the name of the ID class without training on the ID data. Second, as we can see why COCO was created, it is crucial to identify images containing not only ID objects but also both ID and out-of-distribution (OOD) objects as ID images to create robust recognizers. In this paper, we propose a novel problem setting called zero-shot in-distribution (ID) detection, where we identify images containing ID objects as ID images (even if they contain OOD objects), and images lacking ID objects as OOD images without any training. To solve this problem, we leverage the powerful zero-shot capability of CLIP and present a simple and effective approach, Global-Local Maximum Concept Matching (GL-MCM), based on both global and local visual-text alignments of CLIP features. Extensive experiments demonstrate that GL-MCM outperforms comparison methods on both multi-object datasets and single-object ImageNet benchmarks. The code will be available via https://github.com/AtsuMiyai/GL-MCM.

arxiv情報

著者	Atsuyuki Miyai,Qing Yu,Go Irie,Kiyoharu Aizawa
発行日	2023-08-23 13:11:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー