AugMapNet: Improving Spatial Latent Structure via BEV Grid Augmentation for Enhanced Vectorized Online HD Map Construction

要約

自律運転には、レーンや横断歩道などのインフラストラクチャ要素を理解する必要があります。
安全にナビゲートするには、この理解はセンサーデータからリアルタイムで導き出されなければならず、ベクトル化された形式で表現する必要があります。
学習した鳥瞰図（BEV）エンコーダーは、一般的に、複数のビューのカメラ画像のセットを1つのジョイント潜在BEVグリッドに組み合わせるために使用されます。
伝統的に、この潜在的な空間から、中間ラスターマップが予測され、密な空間的監督を提供しますが、望ましいベクトル化された形式への後処理が必要です。
より最近のモデルは、ベクトル化されたマップデコーダーを使用してポリリンとしてインフラストラクチャ要素を直接導き出し、インスタンスレベルの情報を提供します。
私たちのアプローチであるAugmentation Map Network（AugMapNet）は、潜在的なBEVの表現を大幅に強化する新しい技術である潜在的なBEVグリッド増強を提案しています。
AugMapNetは、既存のアーキテクチャよりも効果的にベクトルデコードと密な空間監督を組み合わせているが、統合するために簡単であり、補助的な監督と同じくらい一般的なままである。
NuscenesとArgoverse2データセットの実験は、60mの範囲でのStreamMapNetベースラインで最大13.3％のベクトル化されたMAP予測パフォーマンスの大幅な改善と、より大きな範囲での改善の大幅な改善を示しています。
メソッドを別のベースラインに適用することにより、転送可能性を確認し、同様の改善を見つけます。
潜在的なBEVグリッドの詳細な分析は、AugMapNetのより構造化された潜在スペースを確認し、純粋なパフォーマンスの改善を超えた斬新な概念の価値を示しています。
コードはまもなくリリースされます。

要約(オリジナル)

Autonomous driving requires an understanding of the infrastructure elements, such as lanes and crosswalks. To navigate safely, this understanding must be derived from sensor data in real-time and needs to be represented in vectorized form. Learned Bird’s-Eye View (BEV) encoders are commonly used to combine a set of camera images from multiple views into one joint latent BEV grid. Traditionally, from this latent space, an intermediate raster map is predicted, providing dense spatial supervision but requiring post-processing into the desired vectorized form. More recent models directly derive infrastructure elements as polylines using vectorized map decoders, providing instance-level information. Our approach, Augmentation Map Network (AugMapNet), proposes latent BEV grid augmentation, a novel technique that significantly enhances the latent BEV representation. AugMapNet combines vector decoding and dense spatial supervision more effectively than existing architectures while remaining as straightforward to integrate and as generic as auxiliary supervision. Experiments on nuScenes and Argoverse2 datasets demonstrate significant improvements in vectorized map prediction performance up to 13.3% over the StreamMapNet baseline on 60m range and greater improvements on larger ranges. We confirm transferability by applying our method to another baseline and find similar improvements. A detailed analysis of the latent BEV grid confirms a more structured latent space of AugMapNet and shows the value of our novel concept beyond pure performance improvement. The code will be released soon.

arxiv情報

著者	Thomas Monninger,Md Zafar Anwar,Stanislaw Antol,Steffen Staab,Sihao Ding
発行日	2025-03-17 17:55:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AugMapNet: Improving Spatial Latent Structure via BEV Grid Augmentation for Enhanced Vectorized Online HD Map Construction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー