Exposing the Troublemakers in Described Object Detection

要約

言語記述に基づくオブジェクトの検出は、Open-Vocabulary object Detection (OVD) や Referring Expression Comprehension (REC) などの一般的なタスクです。
このペーパーでは、カテゴリ名を OVD 用の柔軟な言語表現に拡張し、既存のオブジェクトのみを接地するという REC の制限を克服することで、記述オブジェクト検出 (DOD) と呼ばれるより実用的な設定にそれらを進めます。
私たちは、柔軟な言語表現を特徴とし、記述されたすべてのオブジェクトに漏れなく注釈を付ける記述検出データセット ($D^3$) を構築することで、国防総省タスクの研究基盤を確立します。
$D^3$ で以前の SOTA メソッドを評価することにより、現在の REC、OVD、および二機能メソッドに失敗するいくつかの問題を発見します。
REC メソッドは信頼スコア、ネガティブインスタンスの拒否、およびマルチターゲットシナリオに苦労しますが、OVD メソッドは長く複雑な記述による制約に直面します。
最近の二機能手法も、REC タスクと OVD タスクのトレーニング手順と推論戦略が分離されているため、DOD ではうまく機能しません。
前述の発見に基づいて、トレーニングデータを再構成し、バイナリ分類サブタスクを導入することで REC 手法を大幅に改善し、既存の手法を上回るベースラインを提案します。
データとコードは https://github.com/shikras/d-cube で入手できます。

要約(オリジナル)

Detecting objects based on language descriptions is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC to only grounding the pre-existing object. We establish the research foundation for DOD tasks by constructing a Description Detection Dataset ($D^3$), featuring flexible language expressions and annotating all described objects without omission. By evaluating previous SOTA methods on $D^3$, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code is available at https://github.com/shikras/d-cube.

arxiv情報

著者	Chi Xie,Zhao Zhang,Yixuan Wu,Feng Zhu,Rui Zhao,Shuang Liang
発行日	2023-07-24 14:06:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exposing the Troublemakers in Described Object Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー