T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

要約

オープンセット物体検出のための非常に実用的なモデルである T-Rex2 を紹介します。
テキストプロンプトに依存するこれまでのオープンセットオブジェクト検出方法は、一般的なオブジェクトの抽象的な概念を効果的にカプセル化していますが、データの不足と説明の制限により、稀なオブジェクトや複雑なオブジェクトの表現に苦労しています。
逆に、視覚的なプロンプトは、具体的な視覚的な例を通じて新しいオブジェクトを描写する点では優れていますが、オブジェクトの抽象的な概念をテキストプロンプトほど効果的に伝えるには不十分です。
テキストと視覚的なプロンプトの両方の補完的な長所と短所を認識し、対照的な学習を通じて単一のモデル内で両方のプロンプトを相乗させる T-Rex2 を紹介します。
T-Rex2 は、テキストプロンプト、ビジュアルプロンプト、およびその両方の組み合わせを含むさまざまな形式の入力を受け入れるため、2 つのプロンプトモダリティを切り替えることでさまざまなシナリオを処理できます。
包括的な実験により、T-Rex2 が幅広いシナリオにわたって優れたゼロショット物体検出能力を発揮することが実証されました。
私たちは、テキストプロンプトと視覚的プロンプトが相乗効果の中で相互に利益を得ることができることを示します。これは、大規模で複雑な現実世界のシナリオをカバーし、一般的な物体検出への道を開くために不可欠です。
モデル API は \url{https://github.com/IDEA-Research/T-Rex} で利用できるようになりました。

要約(オリジナル)

We present T-Rex2, a highly practical model for open-set object detection. Previous open-set object detection methods relying on text prompts effectively encapsulate the abstract concept of common objects, but struggle with rare or complex object representation due to data scarcity and descriptive limitations. Conversely, visual prompts excel in depicting novel objects through concrete visual examples, but fall short in conveying the abstract concept of objects as effectively as text prompts. Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning. T-Rex2 accepts inputs in diverse formats, including text prompts, visual prompts, and the combination of both, so that it can handle different scenarios by switching between the two prompt modalities. Comprehensive experiments demonstrate that T-Rex2 exhibits remarkable zero-shot object detection capabilities across a wide spectrum of scenarios. We show that text prompts and visual prompts can benefit from each other within the synergy, which is essential to cover massive and complicated real-world scenarios and pave the way towards generic object detection. Model API is now available at \url{https://github.com/IDEA-Research/T-Rex}.

arxiv情報

著者	Qing Jiang,Feng Li,Zhaoyang Zeng,Tianhe Ren,Shilong Liu,Lei Zhang
発行日	2024-03-21 17:57:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー