Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers

要約

この研究では、再ランキングのためのグローバル記述子とペア分類器を同時に学習する、視覚的場所認識 (VPR) のための新しい共同トレーニング方法を提案します。
ペア分類器は、指定された画像のペアが同じ場所からのものであるかどうかを予測できます。
ネットワークは、エンコーダーとペア分類器の両方の Vision Transformer コンポーネントのみで構成され、両方のコンポーネントはそれぞれのクラストークンを使用してトレーニングされます。
既存の VPR 方法では、通常、ネットワークは ImageNet などの汎用画像データセットからの事前トレーニングされた重みを使用して初期化されます。
この研究では、事前トレーニングタスクとしてシャムマスクイメージモデリングを使用する、代替の事前トレーニング戦略を提案します。
VPR 用に特別に調整された視覚的特徴を学習するために、モデルを事前トレーニングするために、大規模な VPR データセットのコレクションから場所を認識した画像サンプリング手順を提案します。
トレーニングの第 2 段階でマスクイメージモデリングのエンコーダーとデコーダーの重みを再利用することにより、Pair-VPR は、ViT-B エンコーダーを使用して 5 つのベンチマークデータセットにわたって最先端の VPR パフォーマンスを達成し、さらにローカリゼーションを向上させることができます。
より大きなエンコーダーでリコールします。
ペア VPR の Web サイトは、https://csiro-robotics.github.io/Pair-VPR です。

要約(オリジナル)

In this work we propose a novel joint training method for Visual Place Recognition (VPR), which simultaneously learns a global descriptor and a pair classifier for re-ranking. The pair classifier can predict whether a given pair of images are from the same place or not. The network only comprises Vision Transformer components for both the encoder and the pair classifier, and both components are trained using their respective class tokens. In existing VPR methods, typically the network is initialized using pre-trained weights from a generic image dataset such as ImageNet. In this work we propose an alternative pre-training strategy, by using Siamese Masked Image Modelling as a pre-training task. We propose a Place-aware image sampling procedure from a collection of large VPR datasets for pre-training our model, to learn visual features tuned specifically for VPR. By re-using the Mask Image Modelling encoder and decoder weights in the second stage of training, Pair-VPR can achieve state-of-the-art VPR performance across five benchmark datasets with a ViT-B encoder, along with further improvements in localization recall with larger encoders. The Pair-VPR website is: https://csiro-robotics.github.io/Pair-VPR.

arxiv情報

著者	Stephen Hausler,Peyman Moghadam
発行日	2024-10-09 07:09:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー