Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers

要約

本研究では、視覚的場所認識(Visual Place Recognition: VPR)のために、グローバル記述子と再順位付けのためのペア分類器を同時に学習する、新しい共同学習法を提案する。ペア分類器は、与えられた画像のペアが同じ場所のものかどうかを予測することができる。このネットワークは、エンコーダとペア分類器のためのVision Transformerコンポーネントのみから構成され、両コンポーネントはそれぞれのクラス・トークンを用いて学習される。既存のVPR手法では、通常、ImageNetのような一般的な画像データセットから事前に訓練された重みを用いてネットワークを初期化する。本研究では、シャムマスク画像モデリングを事前学習タスクとして用いることで、代替的な事前学習戦略を提案する。本研究では、VPRに特化した視覚的特徴を学習するために、大規模なVPRデータセットから場所を意識した画像サンプリング手法を提案する。マスクイメージモデリングのエンコーダとデコーダの重みを第2段階の学習で再利用することで、Pair-VPRはViT-Bエンコーダを用いた5つのベンチマークデータセットで最先端のVPR性能を達成することができ、さらに大きなエンコーダを用いた場合にはローカライゼーションリコールがさらに向上する。Pair-VPRのウェブサイトはhttps://csiro-robotics.github.io/Pair-VPR。

要約(オリジナル)

In this work we propose a novel joint training method for Visual Place Recognition (VPR), which simultaneously learns a global descriptor and a pair classifier for re-ranking. The pair classifier can predict whether a given pair of images are from the same place or not. The network only comprises Vision Transformer components for both the encoder and the pair classifier, and both components are trained using their respective class tokens. In existing VPR methods, typically the network is initialized using pre-trained weights from a generic image dataset such as ImageNet. In this work we propose an alternative pre-training strategy, by using Siamese Masked Image Modelling as a pre-training task. We propose a Place-aware image sampling procedure from a collection of large VPR datasets for pre-training our model, to learn visual features tuned specifically for VPR. By re-using the Mask Image Modelling encoder and decoder weights in the second stage of training, Pair-VPR can achieve state-of-the-art VPR performance across five benchmark datasets with a ViT-B encoder, along with further improvements in localization recall with larger encoders. The Pair-VPR website is: https://csiro-robotics.github.io/Pair-VPR.

arxiv情報

著者	Stephen Hausler,Peyman Moghadam
発行日	2025-03-02 08:59:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー