FaVoR: Features via Voxel Rendering for Camera Relocalization

要約

カメラの再位置推定方法は、密な画像の位置合わせからクエリ画像からの直接的なカメラポーズ回帰まで多岐にわたります。
これらの中でも、スパース特徴マッチングは、多くのアプリケーションで効率的かつ多用途で、一般に軽量なアプローチとして際立っています。
ただし、特徴ベースの方法では、視点や外観の大幅な変更に苦労することが多く、マッチングの失敗や不正確な姿勢推定につながります。
この制限を克服するために、2D フィーチャの全体的には疎であるが局所的には密な 3D 表現を活用する新しいアプローチを提案します。
一連のフレームにわたってランドマークを追跡および三角測量することにより、追跡中に観察された画像パッチ記述子をレンダリングするために最適化されたスパースボクセルマップを構築します。
初期姿勢推定が与えられると、まずボリュームレンダリングを使用してボクセルから記述子を合成し、次に特徴マッチングを実行してカメラの姿勢を推定します。
この方法論により、目に見えないビューの記述子の生成が可能になり、ビューの変更に対する堅牢性が強化されます。
私たちは 7-Scenes および Cambridge Landmarks データセットに関するメソッドを広範囲に評価しています。
私たちの結果は、私たちの方法が屋内環境における既存の最先端の特徴表現技術を大幅に上回り、中央値の翻訳誤差を最大 39% 改善できることを示しています。
さらに、私たちのアプローチは、メモリと計算コストを低く抑えながら、屋外シナリオで他の方法と同等の結果をもたらします。

要約(オリジナル)

Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.

arxiv情報

著者	Vincenzo Polizzi,Marco Cannici,Davide Scaramuzza,Jonathan Kelly
発行日	2024-11-29 20:48:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FaVoR: Features via Voxel Rendering for Camera Relocalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー