ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving

要約

自律走行において、信頼性の高い3次元物体認識は不可欠である。あらゆる気象条件下でのセンシング能力を持つ4Dレーダーは、最近注目を集めている。しかし、LiDARに比べ、4Dレーダーは点群データが少ない。本論文では、4Dレーダーと視覚モダリティを融合したZFusionと呼ばれる3D物体検出法を提案する。ZFusionの中核として、我々の提案するFP-DDCA（Feature Pyramid-Double Deformable Cross Attention）フューザーは、（疎な）レーダー情報と（密な）視覚情報を効果的に補完する。具体的には、特徴ピラミッド構造を持つFP-DDCAフューザーは、異なるスケールのマルチモーダル特徴をインタラクティブに融合するためのTransformerブロックを搭載し、知覚精度を向上させます。さらに、4Dレーダーの物理的特性により、Depth-Context-Splitビュー変換モジュールを利用する。4DレーダーがLiDARよりもはるかに低コストであることを考慮すると、ZFusionはLiDARベースの手法に代わる魅力的な選択肢となります。VoD(View-of-Delft)データセットのような典型的な交通シナリオにおいて、ZFusionは妥当な推論速度で、ベースライン手法と比較して、全領域において競争力のあるmAPを持ちながら、関心領域において最先端のmAP(平均平均精度)を達成し、LiDARに近い性能を示し、カメラのみの手法を大きく上回ることが実験で示されました。

要約(オリジナル)

Reliable 3D object perception is essential in autonomous driving. Owing to its sensing capabilities in all weather conditions, 4D radar has recently received much attention. However, compared to LiDAR, 4D radar provides much sparser point cloud. In this paper, we propose a 3D object detection method, termed ZFusion, which fuses 4D radar and vision modality. As the core of ZFusion, our proposed FP-DDCA (Feature Pyramid-Double Deformable Cross Attention) fuser complements the (sparse) radar information and (dense) vision information, effectively. Specifically, with a feature-pyramid structure, the FP-DDCA fuser packs Transformer blocks to interactively fuse multi-modal features at different scales, thus enhancing perception accuracy. In addition, we utilize the Depth-Context-Split view transformation module due to the physical properties of 4D radar. Considering that 4D radar has a much lower cost than LiDAR, ZFusion is an attractive alternative to LiDAR-based methods. In typical traffic scenarios like the VoD (View-of-Delft) dataset, experiments show that with reasonable inference speed, ZFusion achieved the state-of-the-art mAP (mean average precision) in the region of interest, while having competitive mAP in the entire area compared to the baseline methods, which demonstrates performance close to LiDAR and greatly outperforms those camera-only methods.

arxiv情報

著者	Sheng Yang,Tong Zhan,Shichen Qiao,Jicheng Gong,Qing Yang,Yanfeng Lu,Jian Wang
発行日	2025-04-04 13:29:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー