CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration

要約

画像から点群への登録は、点群に対する RGB 画像の相対的なカメラ姿勢を決定することを目的としています。
これは、事前に構築された LiDAR マップ内のカメラの位置特定において重要な役割を果たします。
モダリティのギャップにもかかわらず、ほとんどの学習ベースの手法は、反復最適化のためのフィードバックメカニズムを持たずに特徴空間で 2D と 3D の点の対応を確立するため、精度と解釈可能性が低くなります。
この論文では、登録手順を反復的なマルコフ決定プロセスとして再定式化し、各中間状態に基づいてカメラのポーズを段階的に調整できるようにすることを提案します。
これを達成するために、強化学習を使用してクロスモーダル登録エージェント (CMR-Agent) を開発し、模倣学習を使用してその登録ポリシーを初期化し、トレーニングの安定性と迅速な開始を実現します。
クロスモーダル観察によれば、カメラ錐台の空間的切断によって引き起こされる無駄な中立状態を削減しながら、RGB画像のきめ細かい特徴を最大限に活用する2D-3Dハイブリッド状態表現を提案します。
さらに、フレームワーク全体は、ワンショットのクロスモーダル埋め込みを効率的に再利用できるように適切に設計されており、時間のかかる繰り返しの特徴抽出を回避します。
KITTI-Odometry および NuScenes データセットに関する広範な実験により、CMR-Agent が登録において競合する精度と効率を達成していることが実証されました。
ワンショットの埋め込みが完了すると、各反復には数ミリ秒しかかかりません。

要約(オリジナル)

Image-to-point cloud registration aims to determine the relative camera pose of an RGB image with respect to a point cloud. It plays an important role in camera localization within pre-built LiDAR maps. Despite the modality gaps, most learning-based methods establish 2D-3D point correspondences in feature space without any feedback mechanism for iterative optimization, resulting in poor accuracy and interpretability. In this paper, we propose to reformulate the registration procedure as an iterative Markov decision process, allowing for incremental adjustments to the camera pose based on each intermediate state. To achieve this, we employ reinforcement learning to develop a cross-modal registration agent (CMR-Agent), and use imitation learning to initialize its registration policy for stability and quick-start of the training. According to the cross-modal observations, we propose a 2D-3D hybrid state representation that fully exploits the fine-grained features of RGB images while reducing the useless neutral states caused by the spatial truncation of camera frustum. Additionally, the overall framework is well-designed to efficiently reuse one-shot cross-modal embeddings, avoiding repetitive and time-consuming feature extraction. Extensive experiments on the KITTI-Odometry and NuScenes datasets demonstrate that CMR-Agent achieves competitive accuracy and efficiency in registration. Once the one-shot embeddings are completed, each iteration only takes a few milliseconds.

arxiv情報

著者	Gongxin Yao,Yixin Xuan,Xinyang Li,Yu Pan
発行日	2024-08-05 11:40:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー