Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

要約

大規模データを活用することで、多くのコンピュータビジョンタスクで性能向上をもたらすことができる。しかし、残念ながら物体検出では、複数のデータセットの下で単一のモデルを一緒に学習する場合、このようなことは起こらない。我々は、分類法の違いとバウンディングボックスの注釈の不一致という2つの主な障害を観察しており、これらは異なるデータセットにおけるドメインのギャップをもたらし、共同学習の妨げとなっている。本論文では、データセットごとにカテゴリの言語埋め込みに関するオブジェクトクエリを適応させるだけで、これら2つの課題に効果的に対処できることを示す。我々は、データセットの異なる分布に基づいて、カテゴリ埋め込みのクエリを動的に適応させる検出ハブを設計する。従来の手法では、全てのデータセットに共通する埋め込みを学習していたが、本手法では、言語埋め込みを共通のカテゴリの意味的中心として利用し、同時に、異なるデータセットに属する特定のカテゴリに対する意味的な偏りを学習することで、アノテーションの違いを処理し、ドメインのギャップを埋め合わせることが可能である。これらの新しい改良により、単一の検出器を複数のデータセットで同時にエンドツーエンドで学習させ、その利点を十分に活用することができるようになりました。さらに、複数のデータセットに対する共同学習の実験により、個別に微調整された検出器と比較して、大幅な性能向上を実証しています。

要約(オリジナル)

Leveraging large-scale data can introduce performance gains on many computer vision tasks. Unfortunately, this does not happen in object detection when training a single model under multiple datasets together. We observe two main obstacles: taxonomy difference and bounding box annotation inconsistency, which introduces domain gaps in different datasets that prevents us from joint training. In this paper, we show that these two challenges can be effectively addressed by simply adapting object queries on language embedding of categories per dataset. We design a detection hub to dynamically adapt queries on category embedding based on the different distributions of datasets. Unlike previous methods attempted to learn a joint embedding for all datasets, our adaptation method can utilize the language embedding as semantic centers for common categories, while learning the semantic bias towards specific categories belonging to different datasets to handle annotation differences and make up the domain gaps. These novel improvements enable us to end-to-end train a single detector on multiple datasets simultaneously to fully take their advantages. Further experiments on joint training on multiple datasets demonstrate the significant performance gains over separate individual fine-tuned detectors.

arxiv情報

著者	Lingchen Meng,Xiyang Dai,Yinpeng Chen,Pengchuan Zhang,Dongdong Chen,Mengchen Liu,Jianfeng Wang,Zuxuan Wu,Lu Yuan,Yu-Gang Jiang
発行日	2022-06-07 17:59:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー