MS-Occ: Multi-Stage LiDAR-Camera Fusion for 3D Semantic Occupancy Prediction

要約

正確な3Dセマンティック占有率の認識は、多様で不規則なオブジェクトを備えた複雑な環境での自律運転に不可欠です。
視覚中心の方法は幾何学的な不正確さに悩まされていますが、LIDARベースのアプローチには豊富なセマンティック情報がしばしば欠けています。
これらの制限に対処するために、中期融合と後期融合を含む新しいマルチステージLidar-Camera融合フレームワークであるMS-OCCが提案され、Lidarの幾何学的忠実度と階層的なクロスモーダル融合によるカメラベースのセマンティックリッチネスを統合します。
このフレームワークは、2つの重要な段階でイノベーションを導入します。（1）中間段階の特徴の融合で、ガウス-GEOモジュールはガウスカーネルのレンダリングをレバレッジして、密集した幾何学的前症で2D画像機能を強化し、セマンティックアウェアモジュール豊富なライダーボクセルを介して、セマンティックモジュール豊富なライダーボクセルを備えた2D画像機能を強化します。
（2）後期ボクセル融合では、適応融合（AF）モジュールはモダリティ全体でボクセル機能を動的にバランスさせますが、高分格化信頼性ボクセル融合（HCCVF）モジュールは、自己触媒ベースの洗練を使用してセマンティックな矛盾を解決します。
ヌスセン閉鎖ベンチマークの実験では、MS-OCCが32.1％の結合（IOU）と25.3％の平均IOU（MIOU）の交差点を達成し、最先端を +0.7％IOUおよび +2.4％MIOUで上回ることが示されています。
アブレーション研究は、各モジュールの貢献度をさらに検証し、小型認知の知覚を大幅に改善し、安全性が批判的な自律運転シナリオのMS-OCCの実用的な価値を示しています。

要約(オリジナル)

Accurate 3D semantic occupancy perception is essential for autonomous driving in complex environments with diverse and irregular objects. While vision-centric methods suffer from geometric inaccuracies, LiDAR-based approaches often lack rich semantic information. To address these limitations, MS-Occ, a novel multi-stage LiDAR-camera fusion framework which includes middle-stage fusion and late-stage fusion, is proposed, integrating LiDAR’s geometric fidelity with camera-based semantic richness via hierarchical cross-modal fusion. The framework introduces innovations at two critical stages: (1) In the middle-stage feature fusion, the Gaussian-Geo module leverages Gaussian kernel rendering on sparse LiDAR depth maps to enhance 2D image features with dense geometric priors, and the Semantic-Aware module enriches LiDAR voxels with semantic context via deformable cross-attention; (2) In the late-stage voxel fusion, the Adaptive Fusion (AF) module dynamically balances voxel features across modalities, while the High Classification Confidence Voxel Fusion (HCCVF) module resolves semantic inconsistencies using self-attention-based refinement. Experiments on the nuScenes-OpenOccupancy benchmark show that MS-Occ achieves an Intersection over Union (IoU) of 32.1% and a mean IoU (mIoU) of 25.3%, surpassing the state-of-the-art by +0.7% IoU and +2.4% mIoU. Ablation studies further validate the contribution of each module, with substantial improvements in small-object perception, demonstrating the practical value of MS-Occ for safety-critical autonomous driving scenarios.

arxiv情報

著者	Zhiqiang Wei,Lianqing Zheng,Jianan Liu,Tao Huang,Qing-Long Han,Wenwen Zhang,Fengdeng Zhang
発行日	2025-04-22 13:33:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MS-Occ: Multi-Stage LiDAR-Camera Fusion for 3D Semantic Occupancy Prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー