Multi-Level Embedding and Alignment Network with Consistency and Invariance Learning for Cross-View Geo-Localization

要約

Cross-View Geo-Localization（CVGL）には、最も類似したGPSタグ付き衛星画像を取得することにより、ドローン画像のローカリゼーションを決定することが含まれます。
ただし、プラットフォーム間のイメージングギャップはしばしば重要であり、視点の変動は実質的なものであり、既存の方法がクロスビュー機能を効果的に関連付け、一貫した不変特性を抽出する能力を制限します。
さらに、既存の方法は、モデルのパフォーマンスを改善する際の計算およびストレージ要件の増加の問題を見落としていることがよくあります。
これらの制限を処理するために、マルチレベルの埋め込みおよびアライメントネットワーク（平均）と呼ばれる軽量強化アライメントネットワークを提案します。
平均ネットワークは、プログレッシブマルチレベルの拡張戦略、グローバルからローカルへの関連性、およびクロスドメインアライメントを使用して、レベル全体で機能通信を可能にします。
これにより、さまざまなレベルで機能を効果的に接続し、堅牢なクロスビューの一貫したマッピングとモダリティ不変の機能を学習することができます。
さらに、平均は、軽量のブランチ設計と組み合わせた浅いバックボーンネットワークを採用し、パラメーターカウントと計算の複雑さを効果的に削減します。
University-1652およびSUES-200データセットの実験結果は、平均がパラメーター数を62.17％減少させ、最先端のモデルと比較して70.99％減少することを示しています。
私たちのコードとモデルは、https：//github.com/ischenawei/meanでリリースされます。

要約(オリジナル)

Cross-View Geo-Localization (CVGL) involves determining the localization of drone images by retrieving the most similar GPS-tagged satellite images. However, the imaging gaps between platforms are often significant and the variations in viewpoints are substantial, which limits the ability of existing methods to effectively associate cross-view features and extract consistent and invariant characteristics. Moreover, existing methods often overlook the problem of increased computational and storage requirements when improving model performance. To handle these limitations, we propose a lightweight enhanced alignment network, called the Multi-Level Embedding and Alignment Network (MEAN). The MEAN network uses a progressive multi-level enhancement strategy, global-to-local associations, and cross-domain alignment, enabling feature communication across levels. This allows MEAN to effectively connect features at different levels and learn robust cross-view consistent mappings and modality-invariant features. Moreover, MEAN adopts a shallow backbone network combined with a lightweight branch design, effectively reducing parameter count and computational complexity. Experimental results on the University-1652 and SUES-200 datasets demonstrate that MEAN reduces parameter count by 62.17% and computational complexity by 70.99% compared to state-of-the-art models, while maintaining competitive or even superior performance. Our code and models will be released on https://github.com/ISChenawei/MEAN.

arxiv情報

著者	Zhongwei Chen,Zhao-Xu Yang,Hai-Jun Rong
発行日	2025-04-14 14:54:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Level Embedding and Alignment Network with Consistency and Invariance Learning for Cross-View Geo-Localization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー