An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

要約

Grounding-DINO は、Open-Vocabulary Detection (OVD)、Phrase Grounding (PG)、Referring Expression Comprehension (REC) などの複数の視覚タスクに取り組む、最先端のオープンセット検出モデルです。
その有効性により、さまざまな下流アプリケーションの主流アーキテクチャとして広く採用されています。
ただし、その重要性にもかかわらず、元の Grounding-DINO モデルには、トレーニングコードが利用できないため、包括的な公開技術的詳細が不足しています。
このギャップを埋めるために、MMDetection ツールボックスで構築された、オープンソースで包括的でユーザーフレンドリーなベースラインである MM-Grounding-DINO を紹介します。
事前トレーニング用の豊富なビジョンデータセットと、微調整用のさまざまな検出およびグラウンディングデータセットを採用しています。
報告された各結果の包括的な分析と再現のための詳細な設定を提供します。
前述のベンチマークに関する広範な実験により、MM-Grounding-DINO-Tiny が Grounding-DINO-Tiny ベースラインよりも優れていることが実証されました。
私たちはすべてのモデルを研究コミュニティにリリースします。
コードとトレーニングされたモデルは、https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino でリリースされます。

要約(オリジナル)

Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community. Codes and trained models are released at https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino.

arxiv情報

著者	Xiangyu Zhao,Yicheng Chen,Shilin Xu,Xiangtai Li,Xinjiang Wang,Yining Li,Haian Huang
発行日	2024-01-05 06:21:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー