Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

要約

近年の研究により、大規模データを用いた一般的な視覚学習課題で事前に訓練された視覚モデルが、様々な視覚知覚問題に対して有用な特徴表現を提供できることが示されている。しかし、視覚的場所認識(VPR)において、事前に訓練された基礎モデルを利用する試みはほとんどなされていない。モデルの事前学習とVPRでは、学習目的や学習データが本質的に異なるため、このギャップをどのように埋め、VPRのために事前学習済みモデルの能力を十分に引き出すかは、依然として重要な課題である。そこで、我々は、VPRのための事前学習済みモデルのシームレスな適応を実現する新しい手法を提案する。具体的には、場所を識別するための顕著なランドマークに注目し、大域的特徴と局所的特徴の両方を得るために、我々は、大域的特徴と局所的特徴の両方を効率的に実現するハイブリッド適応法を設計する。さらに、効果的な適応を導くために、相互最近傍局所特徴損失を提案し、局所マッチングのために適切な密な局所特徴が生成されることを保証し、再順位付けにおける時間のかかる空間検証を回避する。実験結果によれば、本手法は、より少ない学習データと学習時間で最先端の手法を凌駕し、RANSACベースの空間検証を用いた2段階VPR手法の約3%の検索実行時間しか使用しない。MSLSチャレンジのリーダーボードで1位を獲得している（投稿時）。コードはhttps://github.com/Lu-Feng/SelaVPR。

要約(オリジナル)

Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model pre-training and VPR, how to bridge the gap and fully unleash the capability of pre-trained models for VPR is still a key issue to address. To this end, we propose a novel method to realize seamless adaptation of pre-trained models for VPR. Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method to achieve both global and local adaptation efficiently, in which only lightweight adapters are tuned without adjusting the pre-trained model. Besides, to guide effective adaptation, we propose a mutual nearest neighbor local feature loss, which ensures proper dense local features are produced for local matching and avoids time-consuming spatial verification in re-ranking. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time, and uses about only 3% retrieval runtime of the two-stage VPR methods with RANSAC-based spatial verification. It ranks 1st on the MSLS challenge leaderboard (at the time of submission). The code is released at https://github.com/Lu-Feng/SelaVPR.

arxiv情報

著者	Feng Lu,Lijun Zhang,Xiangyuan Lan,Shuting Dong,Yaowei Wang,Chun Yuan
発行日	2024-04-03 14:59:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー