LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

要約

セマンティック鳥瞰図 (BEV) マップは、自動運転におけるさまざまな意思決定タスクに対して、強力なオクルージョン推論を備えた豊富な表現を提供します。
ただし、ほとんどの BEV マッピングアプローチは、人間が注釈を付けた大量の BEV グラウンドトゥルースデータに依存する完全教師あり学習パラダイムを採用しています。
この研究では、ラベル効率的な方法で単眼正面図 (FV) 画像からセマンティック BEV マップを生成する最初の教師なし表現学習アプローチを提案することで、この制限に対処します。
私たちのアプローチは、教師なしの方法で 2 つの独立した神経経路を使用して、シーンのジオメトリとシーンのセマンティクスについて独立して推論するようにネットワークを事前トレーニングし、その後、BEV 内のラベルのごく一部のみを使用してセマンティック BEV マッピングのタスクに合わせてネットワークを微調整します。
FV 画像の空間的および時間的一貫性を利用してシーンのジオメトリを学習し、シーン表現をエンコードするために新しい時間マスクされたオートエンコーダーの定式化に依存することで、ラベルフリーの事前トレーニングを実現します。
KITTI-360 および nuScenes データセットの広範な評価により、BEV ラベルの 1% のみを使用し、追加のラベル付きデータを使用しないにもかかわらず、私たちのアプローチが既存の最先端のアプローチと同等に機能することが実証されました。

要約(オリジナル)

Semantic Bird’s Eye View (BEV) maps offer a rich representation with strong occlusion reasoning for various decision making tasks in autonomous driving. However, most BEV mapping approaches employ a fully supervised learning paradigm that relies on large amounts of human-annotated BEV ground truth data. In this work, we address this limitation by proposing the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner and then finetunes it for the task of semantic BEV mapping using only a small fraction of labels in the BEV. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene geometry while relying on a novel temporal masked autoencoder formulation to encode the scene representation. Extensive evaluations on the KITTI-360 and nuScenes datasets demonstrate that our approach performs on par with the existing state-of-the-art approaches while using only 1% of BEV labels and no additional labeled data.

arxiv情報

著者	Nikhil Gosala,Kürsat Petek,B Ravi Kiran,Senthil Yogamani,Paulo Drews-Jr,Wolfram Burgard,Abhinav Valada
発行日	2024-05-29 08:03:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー