BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo

要約

奥行き知覚の固有のあいまいさに制限されているため、現代のカメラベースの 3D オブジェクト検出方法は、パフォーマンスのボトルネックに陥ります。
直感的には、時間的マルチビューステレオ (MVS) テクノロジを活用することは、このあいまいさに取り組むための自然な知識です。
ただし、MVS の従来の試みは、3D オブジェクト検出シーンに適用する場合、2 つの側面で欠陥があります。
2) オブジェクトが頻繁に移動する屋外のシナリオに対処することは困難です。
この目的のために、マッチング候補のスケールを動的に選択する効果的な時間的ステレオ法を導入し、計算オーバーヘッドを大幅に削減できるようにします。
さらに一歩進んで、より価値のある候補を更新する反復アルゴリズムを設計し、移動する候補に適応できるようにします。
提案した方法をマルチビュー 3D 検出器、つまり BEVStereo にインスタンス化します。
BEVStereo は、nuScenes データセットのカメラのみのトラックで、新しい最先端のパフォーマンス (つまり、52.5% の mAP と 61.0% の NDS) を達成します。
一方、広範な実験は、私たちの方法が現代のMVSアプローチよりも複雑な屋外シナリオをうまく処理できることを反映しています。
コードは https://github.com/Megvii-BaseDetection/BEVStereo で公開されています。

要約(オリジナル)

Bounded by the inherent ambiguity of depth perception, contemporary camera-based 3D object detection methods fall into the performance bottleneck. Intuitively, leveraging temporal multi-view stereo (MVS) technology is the natural knowledge for tackling this ambiguity. However, traditional attempts of MVS are flawed in two aspects when applying to 3D object detection scenes: 1) The affinity measurement among all views suffers expensive computation cost; 2) It is difficult to deal with outdoor scenarios where objects are often mobile. To this end, we introduce an effective temporal stereo method to dynamically select the scale of matching candidates, enable to significantly reduce computation overhead. Going one step further, we design an iterative algorithm to update more valuable candidates, making it adaptive to moving candidates. We instantiate our proposed method to multi-view 3D detector, namely BEVStereo. BEVStereo achieves the new state-of-the-art performance (i.e., 52.5% mAP and 61.0% NDS) on the camera-only track of nuScenes dataset. Meanwhile, extensive experiments reflect our method can deal with complex outdoor scenarios better than contemporary MVS approaches. Codes have been released at https://github.com/Megvii-BaseDetection/BEVStereo.

arxiv情報

著者	Yinhao Li,Han Bao,Zheng Ge,Jinrong Yang,Jianjian Sun,Zeming Li
発行日	2022-09-21 10:21:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー