Towards Efficient 3D Object Detection in Bird’s-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach

要約

鳥瞰図（BEV）空間における3D物体検出は、最近、自律走行分野で一般的なアプローチとして浮上してきた。パースペクティブビュー手法に比べて精度と速度推定が向上していることが実証されているにもかかわらず、BEVベースの手法を実際の自律走行車両に導入することは依然として困難である。これは主に、視覚変換器（ViT）ベースのアーキテクチャに依存しているためであり、入力解像度に関して2次的な複雑さが生じる。この問題に対処するために、我々はBEVenetと呼ばれる効率的なBEVベースの3D検出フレームワークを提案する。このフレームワークは、BEVベースの手法の有効性を維持しながら、ViTモデルの制限を回避するために畳み込みのみのアーキテクチャ設計を活用する。我々の実験によると、BEVENetは、NuScenesチャレンジにおいて、現代の最先端（SOTA）アプローチより3$times$高速であり、NuScenes検証データセットにおいて、平均平均精度（mAP）0.456、NuScenes検出スコア（NDS）0.555を達成し、推論スピードは47.6フレーム/秒である。我々の知る限り、本研究は、BEVベースの手法でこのような大幅な効率改善を達成した最初の研究であり、実世界の自律走行アプリケーションに対する実現可能性の向上を強調している。

要約(オリジナル)

3D object detection in Bird’s-Eye-View (BEV) space has recently emerged as a prevalent approach in the field of autonomous driving. Despite the demonstrated improvements in accuracy and velocity estimation compared to perspective view methods, the deployment of BEV-based techniques in real-world autonomous vehicles remains challenging. This is primarily due to their reliance on vision-transformer (ViT) based architectures, which introduce quadratic complexity with respect to the input resolution. To address this issue, we propose an efficient BEV-based 3D detection framework called BEVENet, which leverages a convolutional-only architectural design to circumvent the limitations of ViT models while maintaining the effectiveness of BEV-based methods. Our experiments show that BEVENet is 3$\times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge, achieving a mean average precision (mAP) of 0.456 and a nuScenes detection score (NDS) of 0.555 on the NuScenes validation dataset, with an inference speed of 47.6 frames per second. To the best of our knowledge, this study stands as the first to achieve such significant efficiency improvements for BEV-based methods, highlighting their enhanced feasibility for real-world autonomous driving applications.

arxiv情報

著者	Yuxin Li,Qiang Han,Mengying Yu,Yuxin Jiang,Chaikiat Yeo,Yiheng Li,Zihang Huang,Nini Liu,Hsuanhan Chen,Xiaojun Wu
発行日	2023-12-01 14:52:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Towards Efficient 3D Object Detection in Bird’s-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー