Language-conditioned Detection Transformer

要約

新しいオープン語彙検出フレームワークを紹介します。
私たちのフレームワークは、画像レベルのラベルと、利用可能な場合は詳細な検出アノテーションの両方を使用します。
私たちのフレームワークは 3 つのステップで進みます。
まず、完全に教師付きの検出データに基づいて、言語条件付きオブジェクト検出器をトレーニングします。
この検出器は、トレーニング中にグラウンドトゥルースクラスの有無を確認し、現在のクラスのセットで予測を条件付けします。
この検出器を使用して、画像レベルのラベルで画像を擬似的にラベル付けします。
当社の検出器は、その調整メカニズムにより、従来のアプローチよりもはるかに正確な擬似ラベルを提供します。
最後に、擬似注釈付き画像に対して無条件のオープン語彙検出器をトレーニングします。
結果として得られた DECOLA という名前の検出器は、オープンボキャブラリー LVIS ベンチマークおよび LVIS、COCO、Object365、および OpenImages での直接ゼロショット転送ベンチマークで強力なゼロショットパフォーマンスを示します。
DECOLA は、ゼロショット LVIS ベンチマークで 17.1 AP-rare および 9.4 mAP により従来技術を上回ります。
DECOLA は、オープンソースのデータと学術規模のコンピューティングでトレーニングするだけで、さまざまなモデルサイズ、アーキテクチャ、データセットで最先端の結果を達成します。
コードは https://github.com/janghyuncho/DECOLA で入手できます。

要約(オリジナル)

We present a new open-vocabulary detection framework. Our framework uses both image-level labels and detailed detection annotations when available. Our framework proceeds in three steps. We first train a language-conditioned object detector on fully-supervised detection data. This detector gets to see the presence or absence of ground truth classes during training, and conditions prediction on the set of present classes. We use this detector to pseudo-label images with image-level labels. Our detector provides much more accurate pseudo-labels than prior approaches with its conditioning mechanism. Finally, we train an unconditioned open-vocabulary detector on the pseudo-annotated images. The resulting detector, named DECOLA, shows strong zero-shot performance in open-vocabulary LVIS benchmark as well as direct zero-shot transfer benchmarks on LVIS, COCO, Object365, and OpenImages. DECOLA outperforms the prior arts by 17.1 AP-rare and 9.4 mAP on zero-shot LVIS benchmark. DECOLA achieves state-of-the-art results in various model sizes, architectures, and datasets by only training on open-sourced data and academic-scale computing. Code is available at https://github.com/janghyuncho/DECOLA.

arxiv情報

著者	Jang Hyun Cho,Philipp Krähenbühl
発行日	2023-11-29 18:53:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language-conditioned Detection Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー