Enhancing Novel Object Detection via Cooperative Foundational Models

要約

この研究では、推論中に既知のオブジェクトカテゴリと新しいオブジェクトカテゴリの両方を正確に検出することに焦点を当て、新規オブジェクト検出 (NOD) の困難で新たな問題に取り組みます。
従来の物体検出アルゴリズムは本質的に閉集合であり、NOD を処理する能力が制限されています。
既存の閉集合検出器を開集合検出器に変換する新しいアプローチを紹介します。
この変革は、事前トレーニングされた基本モデル、特に CLIP と SAM の相補的な強みを、協力メカニズムを通じて活用することによって実現されます。
さらに、このメカニズムを GDINO などの最先端のオープンセット検出器と統合することで、物体検出パフォーマンスの新しいベンチマークを確立します。
私たちの方法は、困難な LVIS データセット上の新規物体の検出で 17.42 mAP、既知の物体に対して 42.08 mAP を達成しました。
COCO OVD 分割に当社のアプローチを適応させることで、新しいクラスに関して現在の最先端技術を 7.2 $ \text{AP}_{50} $ の差で上回りました。
私たちのコードは https://rohit901.github.io/coop-foundation-models/ で入手できます。

要約(オリジナル)

In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://rohit901.github.io/coop-foundation-models/ .

arxiv情報

著者	Rohit Bharadwaj,Muzammal Naseer,Salman Khan,Fahad Shahbaz Khan
発行日	2024-12-05 16:34:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Novel Object Detection via Cooperative Foundational Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー