M^2-3DLaneNet: Multi-Modal 3D Lane Detection

要約

3D 空間で正確な車線を推定することは、まばらでスリムな性質のため、依然として困難です。
この作業では、効果的な 3D 車線検出のためのマルチモーダルフレームワークである M^2-3DLaneNet を提案します。
マルチセンサーからの補完的な情報を統合することを目的として、M^2-3DLaneNet は、最初にモーダル固有のバックボーンを使用してマルチモーダル機能を抽出し、次にそれらを統合された鳥瞰図 (BEV) 空間に融合します。
具体的には、私たちの方法は2つのコアコンポーネントで構成されています。
1) 正確な 2D-3D マッピングを実現するために、トップダウンの BEV 生成を提案します。
その中で、Line-Restricted Deform-Attention (LRDA) モジュールを利用して、画像の特徴をトップダウン方式で効果的に強調し、車線の細さの特徴を完全にキャプチャします。
その後、深度認識リフティングを使用して 2D ピラミッド型フィーチャを 3D 空間にキャストし、柱状化によって BEV フィーチャを生成します。
2) さらに、カメラと LiDAR センサーからの補完的な情報を統合し、マルチスケールのカスケードされた注意を通じてマルチモーダル機能を集約するボトムアップ BEV 融合を提案します。
十分な実験により、M^2-3DLaneNet の有効性が実証されました。これは、OpenLane データセットで F1 スコアが 12.1% 向上するなど、以前の最先端の方法よりも大幅に優れています。

要約(オリジナル)

Estimating accurate lane lines in 3D space remains challenging due to their sparse and slim nature. In this work, we propose the M^2-3DLaneNet, a Multi-Modal framework for effective 3D lane detection. Aiming at integrating complementary information from multi-sensors, M^2-3DLaneNet first extracts multi-modal features with modal-specific backbones, then fuses them in a unified Bird’s-Eye View (BEV) space. Specifically, our method consists of two core components. 1) To achieve accurate 2D-3D mapping, we propose the top-down BEV generation. Within it, a Line-Restricted Deform-Attention (LRDA) module is utilized to effectively enhance image features in a top-down manner, fully capturing the slenderness features of lanes. After that, it casts the 2D pyramidal features into 3D space using depth-aware lifting and generates BEV features through pillarization. 2) We further propose the bottom-up BEV fusion, which aggregates multi-modal features through multi-scale cascaded attention, integrating complementary information from camera and LiDAR sensors. Sufficient experiments demonstrate the effectiveness of M^2-3DLaneNet, which outperforms previous state-of-the-art methods by a large margin, i.e., 12.1% F1-score improvement on OpenLane dataset.

arxiv情報

著者	Yueru Luo,Xu Yan,Chaoda Zheng,Chao Zheng,Shuqi Mei,Tang Kun,Shuguang Cui,Zhen Li
発行日	2022-09-13 13:45:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

M^2-3DLaneNet: Multi-Modal 3D Lane Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー