SDVRF: Sparse-to-Dense Voxel Region Fusion for Multi-modal 3D Object Detection

要約

タイトル：複数モダリティの3D物体検出のためのSDVRF（Sparse-to-Dense Voxel Region Fusion）

要約：
– 自律走行の知覚課題において、LiDARポイントクラウドと画像データの相補的な特性により、マルチモーダル法がトレンドとなっている。
– しかし、従来の手法は通常、点群の疎さまたはLiDARとカメラの不整合によるノイズの問題により性能が限定されている。
– これら2つの問題を解決するために、我々は新しい概念であるVoxel Region（VR）を提案する。これは、各ボクセルの疎なローカルポイントクラウドを動的に投影して得られるものである。
– そして、Sparse-to-Dense Voxel Region Fusion（SDVRF）という新しい融合手法を提案する。具体的には、VR内の画像特徴マップのピクセルをより多く集めて、疎な点から抽出されたボクセル特徴を補完し、より密な融合を実現する。また、固定サイズグリッドを投影する従来の手法とは異なり、動的領域を生成する我々の戦略はより良いアライメントを達成し、余分な背景ノイズの導入を避ける。
– さらに、異なるサイズのオブジェクトの特徴をキャプチャし、より多くの文脈情報を抽出するためのマルチスケール融合フレームワークを提案する。
– KITTIデータセット上の実験では、我々の手法が異なるベースラインの性能を改善し、特にPedestrianやCyclistなどの小型クラスにおいて優れた結果を示した。

要約(オリジナル)

In the perception task of autonomous driving, multi-modal methods have become a trend due to the complementary characteristics of LiDAR point clouds and image data. However, the performance of previous methods is usually limited by the sparsity of the point cloud or the noise problem caused by the misalignment between LiDAR and the camera. To solve these two problems, we present a new concept, Voxel Region (VR), which is obtained by projecting the sparse local point clouds in each voxel dynamically. And we propose a novel fusion method, named Sparse-to-Dense Voxel Region Fusion (SDVRF). Specifically, more pixels of the image feature map inside the VR are gathered to supplement the voxel feature extracted from sparse points and achieve denser fusion. Meanwhile, different from prior methods, which project the size-fixed grids, our strategy of generating dynamic regions achieves better alignment and avoids introducing too much background noise. Furthermore, we propose a multi-scale fusion framework to extract more contextual information and capture the features of objects of different sizes. Experiments on the KITTI dataset show that our method improves the performance of different baselines, especially on classes of small size, including Pedestrian and Cyclist.

arxiv情報

著者	Binglu Ren,Jianqin Yin
発行日	2023-05-02 01:27:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

SDVRF: Sparse-to-Dense Voxel Region Fusion for Multi-modal 3D Object Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー