Million-scale Object Detection with Large Vision Model

要約

ここ数年、広範で汎用的な汎用コンピュータービジョンシステムの開発が話題になっています。
強力なユニバーサルシステムは、特定の問題や特定のデータドメインに制限されることなく、さまざまなビジョンタスクを同時に解決できます。これは、実際の現実世界のコンピュータービジョンアプリケーションで非常に重要です。
この研究は、百万規模のマルチドメインユニバーサルオブジェクト検出問題に集中することにより、方向性を推し進めています。
この問題は、データセットカテゴリ間のラベルの重複、ラベルの競合、および階層的な分類法の処理に関する複雑な性質のため、簡単ではありません。
さらに、100 万規模のクロスデータセットオブジェクト検出のために、新しい大規模な事前トレーニング済みビジョンモデルを利用するためのリソース効率の高い方法は、未解決の課題のままです。
このホワイトペーパーでは、ラベル処理、階層を意識した損失設計、事前トレーニング済みの大規模モデルを使用したリソース効率の高いモデルトレーニングのプラクティスを紹介することで、これらの課題に対処しようとします。
当社の手法は、Robust Vision Challenge 2022 (RVC 2022) の物体検出トラックで 2 位にランクされています。
私たちの詳細な研究が、コミュニティにおける同様の問題の代替実践パラダイムとして役立つことを願っています.
コードは https://github.com/linfeng93/Large-UniDet で入手できます。

要約(オリジナル)

Over the past few years, developing a broad, universal, and general-purpose computer vision system has become a hot topic. A powerful universal system would be capable of solving diverse vision tasks simultaneously without being restricted to a specific problem or a specific data domain, which is of great importance in practical real-world computer vision applications. This study pushes the direction forward by concentrating on the million-scale multi-domain universal object detection problem. The problem is not trivial due to its complicated nature in terms of cross-dataset category label duplication, label conflicts, and the hierarchical taxonomy handling. Moreover, what is the resource-efficient way to utilize emerging large pre-trained vision models for million-scale cross-dataset object detection remains an open challenge. This paper tries to address these challenges by introducing our practices in label handling, hierarchy-aware loss design and resource-efficient model training with a pre-trained large model. Our method is ranked second in the object detection track of Robust Vision Challenge 2022 (RVC 2022). We hope our detailed study would serve as an alternative practice paradigm for similar problems in the community. The code is available at https://github.com/linfeng93/Large-UniDet.

arxiv情報

著者	Feng Lin,Wenze Hu,Yaowei Wang,Yonghong Tian,Guangming Lu,Fanglin Chen,Yong Xu,Xiaoyu Wang
発行日	2022-12-19 12:40:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Million-scale Object Detection with Large Vision Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー