UniMODE: Unified Monocular 3D Object Detection

要約

屋内と屋外の両方のシーンを含む統合された単眼 3D オブジェクト検出を実現することは、ロボットナビゲーションなどのアプリケーションにおいて非常に重要です。
ただし、さまざまなデータシナリオを使用してモデルをトレーニングすると、さまざまなジオメトリプロパティや異質なドメイン分布など、それぞれの特性が大きく異なるため、課題が生じます。
これらの課題に対処するために、鳥瞰図 (BEV) 検出パラダイムに基づいて検出器を構築します。この場合、明示的な特徴投影は、検出器をトレーニングするためにデータの複数のシナリオを使用する場合のジオメトリ学習の曖昧さに対処するのに有益です。
次に、古典的な BEV 検出アーキテクチャを 2 つの段階に分割し、前述の課題によって引き起こされる収束の不安定性に対処するために不均一な BEV グリッド設計を提案します。
さらに、計算コストを削減するためのスパース BEV 特徴投影戦略と、異種ドメインを処理するための統一ドメインアラインメント手法を開発します。
これらの技術を組み合わせることで、統一された検出器 UniMODE が導出されます。これは、困難な Omni3D データセット (屋内と屋外の両方のシーンを含む大規模なデータセット) に関する以前の最先端技術を 4.9% AP_3D 上回り、初めて一般化に成功したことを明らかにします。
BEV 検出器を統合した 3D オブジェクト検出に統合します。

要約(オリジナル)

Realizing unified monocular 3D object detection, including both indoor and outdoor scenes, holds great importance in applications like robot navigation. However, involving various scenarios of data to train models poses challenges due to their significantly different characteristics, e.g., diverse geometry properties and heterogeneous domain distributions. To address these challenges, we build a detector based on the bird’s-eye-view (BEV) detection paradigm, where the explicit feature projection is beneficial to addressing the geometry learning ambiguity when employing multiple scenarios of data to train detectors. Then, we split the classical BEV detection architecture into two stages and propose an uneven BEV grid design to handle the convergence instability caused by the aforementioned challenges. Moreover, we develop a sparse BEV feature projection strategy to reduce computational cost and a unified domain alignment method to handle heterogeneous domains. Combining these techniques, a unified detector UniMODE is derived, which surpasses the previous state-of-the-art on the challenging Omni3D dataset (a large-scale dataset including both indoor and outdoor scenes) by 4.9% AP_3D, revealing the first successful generalization of a BEV detector to unified 3D object detection.

arxiv情報

著者	Zhuoling Li,Xiaogang Xu,SerNam Lim,Hengshuang Zhao
発行日	2024-09-17 16:00:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

UniMODE: Unified Monocular 3D Object Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー