Million-scale Object Detection with Large Vision Model

要約

ここ数年、広範で汎用的な汎用コンピュータービジョンシステムの開発への関心が高まっています。
このようなシステムは、特定の問題やデータドメインに制限されることなく、幅広いビジョンタスクを同時に解決できる可能性があります。
これは、実用的な実世界のコンピュータービジョンアプリケーションにとって非常に重要です。
この研究では、クロスデータセットカテゴリラベルの重複、ラベルの競合、階層分類法を処理する必要性など、いくつかの課題を提示する数百万規模のマルチドメインユニバーサルオブジェクト検出問題に焦点を当てています。
さらに、この分野では、数百万規模のクロスデータセットオブジェクト検出のために大規模な事前トレーニング済みビジョンモデルを活用するためのリソース効率の高い方法を見つけるという継続的な課題があります。
これらの課題に対処するために、ラベル処理、階層を意識した損失設計、事前トレーニング済みの大規模モデルを使用したリソース効率の高いモデルトレーニングへのアプローチを紹介します。
当社の手法は、Robust Vision Challenge 2022 (RVC 2022) の物体検出トラックで 2 位にランクされました。
私たちの詳細な研究が、コンピュータビジョンコミュニティにおける同様の問題に対する有用な参照および代替アプローチとして役立つことを願っています。
コードは https://github.com/linfeng93/Large-UniDet で入手できます。

要約(オリジナル)

Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system. Such a system would have the potential to solve a wide range of vision tasks simultaneously, without being restricted to a specific problem or data domain. This is crucial for practical, real-world computer vision applications. In this study, we focus on the million-scale multi-domain universal object detection problem, which presents several challenges, including cross-dataset category label duplication, label conflicts, and the need to handle hierarchical taxonomies. Furthermore, there is an ongoing challenge in the field to find a resource-efficient way to leverage large pre-trained vision models for million-scale cross-dataset object detection. To address these challenges, we introduce our approach to label handling, hierarchy-aware loss design, and resource-efficient model training using a pre-trained large model. Our method was ranked second in the object detection track of the Robust Vision Challenge 2022 (RVC 2022). We hope that our detailed study will serve as a useful reference and alternative approach for similar problems in the computer vision community. The code is available at https://github.com/linfeng93/Large-UniDet.

arxiv情報

著者	Feng Lin,Wenze Hu,Yaowei Wang,Yonghong Tian,Guangming Lu,Fanglin Chen,Yong Xu,Xiaoyu Wang
発行日	2023-02-14 13:09:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Million-scale Object Detection with Large Vision Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー