LiDAR-BEVMTN: Real-Time LiDAR Bird’s-Eye View Multi-Task Perception Network for Autonomous Driving

要約

LiDAR は、自動運転における堅牢な 3D シーン認識に不可欠です。
LiDAR 認識には、カメラ認識に次いで最大の文献があります。
ただし、LiDAR を使用した検出、セグメンテーション、動き推定などのタスクにわたるマルチタスク学習は、特に自動車グレードの組み込みプラットフォームでは比較的未開発のままです。
LiDAR ベースの物体検出、セマンティクス、およびモーションセグメンテーションのためのリアルタイムマルチタスク畳み込みニューラルネットワークを紹介します。
統合アーキテクチャは共有エンコーダとタスク固有のデコーダで構成され、共同表現学習を可能にします。
我々は、物体検出を改善するために意味論的特徴を選択的に転送するための新しい意味論的重み付けおよびガイダンス（SWAG）モジュールを提案します。
私たちの異種トレーニングスキームは、多様なデータセットを組み合わせ、タスク間の補完的な手がかりを活用します。
この研究は、LiDAR 点群からのこれらの主要な認識タスクを統合する最初の組み込み実装を提供し、組み込み NVIDIA Xavier プラットフォーム上で 3 ミリ秒の遅延を実現します。
セマンティックセグメンテーションとモーションセグメンテーションという 2 つのタスクで最先端の結果を達成し、3D オブジェクト検出では最先端に近いパフォーマンスを実現します。
ハードウェア効率を最大化し、マルチタスクの相乗効果を活用することで、私たちの方法は、現実世界の自動運転展開に合わせて調整された正確で効率的なソリューションを提供します。
定性的な結果は https://youtu.be/H-hWRzv2lIY でご覧いただけます。

要約(オリジナル)

LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation. The unified architecture comprises a shared encoder and task-specific decoders, enabling joint representation learning. We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively. Our heterogeneous training scheme combines diverse datasets and exploits complementary cues between tasks. The work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform. We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection. By maximizing hardware efficiency and leveraging multi-task synergies, our method delivers an accurate and efficient solution tailored for real-world automated driving deployment. Qualitative results can be seen at https://youtu.be/H-hWRzv2lIY.

arxiv情報

著者	Sambit Mohapatra,Senthil Yogamani,Varun Ravi Kumar,Stefan Milz,Heinrich Gotzig,Patrick Mäder
発行日	2024-11-18 21:51:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LiDAR-BEVMTN: Real-Time LiDAR Bird’s-Eye View Multi-Task Perception Network for Autonomous Driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー