Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

要約

新規インスタンス検出とセグメンテーション(NIDS: Novel Instance Detection and Segmentation)は、数個のインスタンス例が与えられた場合に、新規なオブジェクトインスタンスを検出し、セグメンテーションすることを目的とする。我々は、オブジェクトの提案生成、インスタンステンプレートと提案領域の両方に対する埋め込み生成、およびインスタンスラベル割り当てのための埋め込みマッチングからなる、統一的でシンプルかつ効果的なフレームワーク（NIDS-Net）を提案する。近年のラージビジョン手法の進歩を活用し、正確なバウンディングボックスとマスクを持つオブジェクト提案を得るために、接地DINOとセグメント何でもモデル(SAM)を利用する。我々のアプローチの中心は、高品質のインスタンス埋め込みを生成することである。我々は、DINOv2 ViTバックボーンからのパッチ埋込みの前景特徴平均を利用し、その後、我々が導入したウェイトアダプタメカニズムにより洗練を行う。我々のウェイトアダプタが特徴空間内で埋め込みを局所的に調整し、少数ショット設定におけるオーバーフィッティングを効果的に抑制できることを実験的に示す。この手法により、素直なマッチング戦略が可能となり、大幅な性能向上をもたらす。我々のフレームワークは現在の最先端手法を凌駕し、4つの検出データセットにおいて平均精度(AP)で22.3、46.2、10.3、24.0という顕著な改善を示す。BOPチャレンジの7つのコアデータセットのインスタンスセグメンテーションタスクにおいて、我々の手法は、主要な公開されたRGB手法よりも約4.5倍高速であり、3.6AP上回っている。NIDS-Netは、競争力のある性能を維持しながら、トップのRGB-D手法より約5.7倍高速である。プロジェクトページ: https://irvlutd.github.io/NIDSNet/

要約(オリジナル)

Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified, simple yet effective framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilize foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting in the few-shot setting. This methodology enables a straightforward matching strategy, resulting in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets. In instance segmentation tasks on seven core datasets of the BOP challenge, our method is around 4.5 times faster than the leading published RGB method and surpasses it by 3.6 AP. NIDS-Net is about 5.7 times faster than the top RGB-D method while maintaining competitive performance. Project Page: https://irvlutd.github.io/NIDSNet/

arxiv情報

著者	Yangxiao Lu,Jishnu Jaykumar P,Yunhui Guo,Nicholas Ruozzi,Yu Xiang
発行日	2024-12-02 19:51:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー