Boosting Segment Anything Model Towards Open-Vocabulary Learning

要約

最近のセグメントエニシングモデル (SAM) は、新しい典型的なビジョン基盤モデルとして登場し、強力なゼロショット一般化と柔軟なプロンプトを示しています。
SAM はさまざまな領域での応用と適応を見出していますが、その主な制限はオブジェクトのセマンティクスを把握できないことにあります。
このペーパーでは、SAM をエンドツーエンドのフレームワークでオープン語彙オブジェクト検出器とシームレスに統合するための Sambor を紹介します。
SAM に固有の優れた機能をすべて保持しながら、カテゴリ名や参照表現などの人による入力に基づいて任意のオブジェクトを検出する機能を強化します。
これを達成するために、SAM の特徴を抽出してゼロショットのオブジェクト位置特定を容易にし、オープンな語彙認識のための包括的な意味情報を注入する新しい SideFormer モジュールを導入します。
さらに、オープンセット領域提案ネットワーク (オープンセット RPN) を考案し、検出器が SAM によって生成されたオープンセット提案を取得できるようにします。
Sambor は、COCO や LVIS を含むベンチマーク全体で優れたゼロショットパフォーマンスを示し、以前の SoTA メソッドに対して高い競争力を示しています。
私たちは、この研究が SAM に多様なオブジェクトカテゴリを認識させ、ビジョン基盤モデルのサポートを受けてオープンな語彙学習を促進する有意義な取り組みとなることを望んでいます。

要約(オリジナル)

The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting. Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics. In this paper, we present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework. While retaining all the remarkable capabilities inherent to SAM, we enhance it with the capacity to detect arbitrary objects based on human inputs like category names or reference expressions. To accomplish this, we introduce a novel SideFormer module that extracts SAM features to facilitate zero-shot object localization and inject comprehensive semantic information for open-vocabulary recognition. In addition, we devise an open-set region proposal network (Open-set RPN), enabling the detector to acquire the open-set proposals generated by SAM. Sambor demonstrates superior zero-shot performance across benchmarks, including COCO and LVIS, proving highly competitive against previous SoTA methods. We aspire for this work to serve as a meaningful endeavor in endowing SAM to recognize diverse object categories and advancing open-vocabulary learning with the support of vision foundation models.

arxiv情報

著者	Xumeng Han,Longhui Wei,Xuehui Yu,Zhiyang Dou,Xin He,Kuiran Wang,Zhenjun Han,Qi Tian
発行日	2023-12-06 17:19:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Boosting Segment Anything Model Towards Open-Vocabulary Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー