D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

要約

最先端の視覚的位置特定手法は、主に、ローカル記述子と 3D 点群を照合するための複雑な手順に依存しています。
ただし、これらの手順では、時間の経過とともに、推論、保存、更新の点で多大なコストが発生する可能性があります。
この研究では、D2S という名前の単純なネットワークを利用して、複雑なローカル記述子とそのシーン座標を表す直接学習ベースのアプローチを提案します。
私たちの方法は、そのシンプルさと費用対効果の高さが特徴です。
テスト段階ではローカリゼーションに単一の RGB 画像のみを利用し、複雑なまばらなシーンをエンコードするための軽量モデルのみが必要です。
提案された D2S は、単純な損失関数とグラフアテンションの組み合わせを採用し、雲、木、いくつかの動的オブジェクトなどの領域を無視しながら、堅牢な記述子に選択的に焦点を当てます。
この選択的な注意により、D2S はスパース記述子のバイナリセマンティック分類を効果的に実行できるようになります。
さらに、シーン固有の一般化およびラベルなしの観測からの自己更新における視覚的位置特定手法の機能を評価するための、単純な屋外データセットを提案します。
私たちのアプローチは、屋内と屋外の両方の環境において、以前の回帰ベースの手法よりも優れたパフォーマンスを発揮します。
これは、昼から夜への移行やドメインの変化への適応を伴うシナリオなど、トレーニングデータを超えて一般化できる能力を示しています。
ソースコード、トレーニング済みモデル、データセット、デモビデオは、次のリンクから入手できます: https://thpjp.github.io/d2s。

要約(オリジナル)

State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a binary-semantic classification for sparse descriptors. Additionally, we propose a simple outdoor dataset to evaluate the capabilities of visual localization methods in scene-specific generalization and self-updating from unlabeled observations. Our approach outperforms the previous regression-based methods in both indoor and outdoor environments. It demonstrates the ability to generalize beyond training data, including scenarios involving transitions from day to night and adapting to domain shifts. The source code, trained models, dataset, and demo videos are available at the following link: https://thpjp.github.io/d2s.

arxiv情報

著者	Bach-Thuan Bui,Huy-Hoang Bui,Dinh-Tuan Tran,Joo-Ho Lee
発行日	2024-10-22 19:09:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー