SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras

要約

自動運転の分野では、複数のカメラから鳥瞰図 (BEV) 内の 3D オブジェクトを直接推論するアプローチに大きな関心が集まっています。
3D 検出のパフォーマンスを向上させるために、単一の画像から 2D 検出器を利用する試みも行われています。
ただし、これらのアプローチは個別の検出器を使用した 2 段階のプロセスに依存しており、2D 検出結果はトークンの選択またはクエリの初期化に 1 回だけ利用されます。
この論文では、複数のカメラから透視図内の 2D オブジェクトと BEV 空間内の 3D オブジェクトを同時に検出する SimPB と呼ばれる単一のモデルを紹介します。
これを達成するために、それぞれの検出タスク用に特別に設計された、いくつかのマルチビュー 2D デコーダ層といくつかの 3D デコーダ層で構成されるハイブリッドデコーダを導入します。
動的クエリ割り当てモジュールと適応クエリ集約モジュールは、循環的な 3D-2D-3D 方式で 2D 結果と 3D 結果の間の相互作用を継続的に更新および改良するために提案されています。
さらに、クエリグループアテンションを利用して、各カメラグループ内の 2D クエリ間の相互作用が強化されます。
実験では、nuScenes データセットでメソッドを評価し、2D と 3D の両方の検出タスクで有望な結果を示します。
コードは https://github.com/nullmax-vision/SimPB から入手できます。

要約(オリジナル)

The field of autonomous driving has attracted considerable interest in approaches that directly infer 3D objects in the Bird’s Eye View (BEV) from multiple cameras. Some attempts have also explored utilizing 2D detectors from single images to enhance the performance of 3D detection. However, these approaches rely on a two-stage process with separate detectors, where the 2D detection results are utilized only once for token selection or query initialization. In this paper, we present a single model termed SimPB, which simultaneously detects 2D objects in the perspective view and 3D objects in the BEV space from multiple cameras. To achieve this, we introduce a hybrid decoder consisting of several multi-view 2D decoder layers and several 3D decoder layers, specifically designed for their respective detection tasks. A Dynamic Query Allocation module and an Adaptive Query Aggregation module are proposed to continuously update and refine the interaction between 2D and 3D results, in a cyclic 3D-2D-3D manner. Additionally, Query-group Attention is utilized to strengthen the interaction among 2D queries within each camera group. In the experiments, we evaluate our method on the nuScenes dataset and demonstrate promising results for both 2D and 3D detection tasks. Our code is available at: https://github.com/nullmax-vision/SimPB.

arxiv情報

著者	Yingqi Tang,Zhaotie Meng,Guoliang Chen,Erkang Cheng
発行日	2024-03-15 14:39:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー