A Generalized Multi-Modal Fusion Detection Framework

要約

LiDAR ポイントクラウドは、自動運転における最も一般的なデータソースになりました。
ただし、点群がまばらであるため、特定のシナリオでは正確で信頼性の高い検出を実現できません。
点群との補完性により、画像はますます注目を集めています。
ある程度の成功はありますが、既存の融合方法はハード融合を実行するか、直接融合しません。
この論文では、マルチモーダル機能を使用して、MMFusion と呼ばれる一般的な 3D 検出フレームワークを提案します。
このフレームワークは、LiDAR と画像の正確な融合を実現し、複雑なシーンでの 3D 検出を改善することを目的としています。
私たちのフレームワークは、LiDAR ストリームとカメラストリームの 2 つの別個のストリームで構成されており、単一モードの特徴抽出ネットワークと互換性があります。
LiDAR ストリームのボクセルローカル認識モジュールは、ローカルフィーチャの表現を強化し、マルチモーダルフィーチャフュージョンモジュールは、異なるストリームからのフィーチャ出力を選択的に組み合わせて、より優れたフュージョンを実現します。
広範な実験により、当社のフレームワークは既存のベンチマークよりも優れているだけでなく、特に KITTI ベンチマークでのサイクリストと歩行者の検出において、強力なロバスト性と一般化機能を備えた検出を改善することが示されています。
私たちの研究が、自動運転タスクのためのマルチモーダルフュージョンに関する研究をさらに刺激することを願っています。

要約(オリジナル)

LiDAR point clouds have become the most common data source in autonomous driving. However, due to the sparsity of point clouds, accurate and reliable detection cannot be achieved in specific scenarios. Because of their complementarity with point clouds, images are getting increasing attention. Although with some success, existing fusion methods either perform hard fusion or do not fuse in a direct manner. In this paper, we propose a generic 3D detection framework called MMFusion, using multi-modal features. The framework aims to achieve accurate fusion between LiDAR and images to improve 3D detection in complex scenes. Our framework consists of two separate streams: the LiDAR stream and the camera stream, which can be compatible with any single-modal feature extraction network. The Voxel Local Perception Module in the LiDAR stream enhances local feature representation, and then the Multi-modal Feature Fusion Module selectively combines feature output from different streams to achieve better fusion. Extensive experiments have shown that our framework not only outperforms existing benchmarks but also improves their detection, especially for detecting cyclists and pedestrians on KITTI benchmarks, with strong robustness and generalization capabilities. Hopefully, our work will stimulate more research into multi-modal fusion for autonomous driving tasks.

arxiv情報

著者	Leichao Cui,Xiuxian Li,Min Meng,Xiaoyu Mo
発行日	2023-03-13 12:38:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Generalized Multi-Modal Fusion Detection Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー