FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization

要約

異常検出方法は通常、トレーニングのためにターゲットクラスからの広範な正常サンプルを必要とするため、コールドスタートなどの迅速な適応が必要なシナリオへの適用が制限されます。
ゼロショットおよび少数ショットの異常検出では、事前に対象クラスのラベル付きサンプルを必要としないため、有望な研究の方向性となります。
既存のゼロショットおよび少数ショットのアプローチでは、多くの場合、強力なマルチモーダルモデルを活用して、画像とテキストの類似性を比較することで異常を検出し、位置を特定します。
しかし、彼らが手作りした一般的な説明では、さまざまな物体に現れる可能性のあるさまざまな異常を捉えることができず、単純なパッチレベルの画像とテキストのマッチングでは、形状やサイズが異なる異常領域の位置を特定するのに苦労することがよくあります。
これらの問題に対処するために、この文書では 2 つの主要なコンポーネントで構成される FiLo++ メソッドを提案します。
最初のコンポーネントである Fused Fine-Grained descriptions (FusDes) は、大規模な言語モデルを利用してオブジェクトカテゴリごとに異常の説明を生成し、固定プロンプトテンプレートと学習可能なプロンプトテンプレートの両方を組み合わせて、実行時プロンプトフィルタリングメソッドを適用して、より正確でタスク固有のテキスト説明を生成します。
。
2 番目のコンポーネントである Deformable Localization (DefLoc) は、ビジョン基盤モデル Grounding DINO と位置強調テキスト説明およびマルチスケール Deformable Cross-modal Interaction (MDCI) モジュールを統合し、さまざまな形状やサイズの異常の正確な位置特定を可能にします。
さらに、数ショットの異常検出パフォーマンスを向上させるために、位置強化パッチマッチングアプローチを設計します。
複数のデータセットでの実験により、FiLo++ が既存の方法と比較してパフォーマンスが大幅に向上することが実証されました。
コードは https://github.com/CASIA-IVA-Lab/FiLo で入手できます。

要約(オリジナル)

Anomaly detection methods typically require extensive normal samples from the target class for training, limiting their applicability in scenarios that require rapid adaptation, such as cold start. Zero-shot and few-shot anomaly detection do not require labeled samples from the target class in advance, making them a promising research direction. Existing zero-shot and few-shot approaches often leverage powerful multimodal models to detect and localize anomalies by comparing image-text similarity. However, their handcrafted generic descriptions fail to capture the diverse range of anomalies that may emerge in different objects, and simple patch-level image-text matching often struggles to localize anomalous regions of varying shapes and sizes. To address these issues, this paper proposes the FiLo++ method, which consists of two key components. The first component, Fused Fine-Grained Descriptions (FusDes), utilizes large language models to generate anomaly descriptions for each object category, combines both fixed and learnable prompt templates and applies a runtime prompt filtering method, producing more accurate and task-specific textual descriptions. The second component, Deformable Localization (DefLoc), integrates the vision foundation model Grounding DINO with position-enhanced text descriptions and a Multi-scale Deformable Cross-modal Interaction (MDCI) module, enabling accurate localization of anomalies with various shapes and sizes. In addition, we design a position-enhanced patch matching approach to improve few-shot anomaly detection performance. Experiments on multiple datasets demonstrate that FiLo++ achieves significant performance improvements compared with existing methods. Code will be available at https://github.com/CASIA-IVA-Lab/FiLo.

arxiv情報

著者	Zhaopeng Gu,Bingke Zhu,Guibo Zhu,Yingying Chen,Ming Tang,Jinqiao Wang
発行日	2025-01-17 09:38:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー