Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models

要約

最近の工業用視覚的異常検出の進歩により、高速な推論速度を維持しながら、異常領域の識別とセグメンテーションにおいて卓越した性能が実証されている。しかしながら、異常の分類（異なるタイプの異常を区別すること）は、実世界の検査タスクにおいて非常に重要であるにもかかわらず、ほとんど未解明のままである。このギャップに対処するために、我々は、LLMベースの新しい異常分類パイプラインであるVELMを提案する。推論速度が非常に重要であることを考慮し、我々はまず、観察の正常性を評価するために、ビジョンエキスパートとして教師なし異常検出法を適用する。異常が検出された場合、LLMはそのタイプを分類する。異常分類モデルの開発と評価における重要な課題は、既存のデータセットにおける異常クラスの正確な注釈の欠如である。この限界に対処するために、我々は MVTec-AC と VisA-AC を導入する。MVTec-AC と VisA-AC は、広く使用されている MVTec-AD と VisA データセットの改良版であり、厳密な評価のために正確な異常クラスラベルを含んでいる。我々のアプローチはMVTec-ADで80.4%、MVTec-ACで84%という、従来のベースラインを5%上回る最新の異常分類精度を達成し、異常の理解と分類におけるVELMの有効性を実証した。我々は、我々の方法論とベンチマークが、異常分類のさらなる研究を刺激し、異常の検出と包括的な異常の特徴付けの間のギャップを埋める一助となることを願っている。

要約(オリジナル)

Recent advances in visual industrial anomaly detection have demonstrated exceptional performance in identifying and segmenting anomalous regions while maintaining fast inference speeds. However, anomaly classification-distinguishing different types of anomalies-remains largely unexplored despite its critical importance in real-world inspection tasks. To address this gap, we propose VELM, a novel LLM-based pipeline for anomaly classification. Given the critical importance of inference speed, we first apply an unsupervised anomaly detection method as a vision expert to assess the normality of an observation. If an anomaly is detected, the LLM then classifies its type. A key challenge in developing and evaluating anomaly classification models is the lack of precise annotations of anomaly classes in existing datasets. To address this limitation, we introduce MVTec-AC and VisA-AC, refined versions of the widely used MVTec-AD and VisA datasets, which include accurate anomaly class labels for rigorous evaluation. Our approach achieves a state-of-the-art anomaly classification accuracy of 80.4% on MVTec-AD, exceeding the prior baselines by 5%, and 84% on MVTec-AC, demonstrating the effectiveness of VELM in understanding and categorizing anomalies. We hope our methodology and benchmark inspire further research in anomaly classification, helping bridge the gap between detection and comprehensive anomaly characterization.

arxiv情報

著者	Sassan Mokhtar,Arian Mousakhan,Silvio Galesso,Jawad Tayyub,Thomas Brox
発行日	2025-05-05 13:08:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー