ChangeViT: Unleashing Plain Vision Transformers for Change Detection

要約

リモートセンシング画像における変化検出は、地表の環境変化を追跡するために不可欠です。
ビジョントランスフォーマー (ViT) は、数多くのコンピュータービジョンアプリケーションのバックボーンとして成功を収めていますが、変更検出では依然として十分に活用されておらず、その強力な特徴抽出機能により、畳み込みニューラルネットワーク (CNN) が依然として主流となっています。
この論文では、私たちの研究は、大規模な変化を識別するという ViT の独自の利点、つまり CNN では及ばない機能を明らかにしています。
この洞察を活用して、大規模な変更のパフォーマンスを向上させるためにプレーン ViT バックボーンを採用するフレームワークである ChangeViT を紹介します。
このフレームワークは、詳細な空間特徴を生成する詳細キャプチャモジュールと、きめの細かい空間情報を高レベルのセマンティック学習に効率的に統合する特徴インジェクターによって補完されます。
機能の統合により、ChangeViT は大規模な変更の検出と詳細な詳細のキャプチャの両方で優れており、さまざまな規模にわたる包括的な変更検出を提供します。
ChangeViT は、付加機能なしで、3 つの一般的な高解像度データセット (つまり、LEVIR-CD、WHU-CD、および CLCD) と 1 つの低解像度データセット (つまり、OSCD) で最先端のパフォーマンスを達成しています。
変化検出のためのプレーン ViT の可能性が解き放たれます。
さらに、徹底的な定量的および定性的分析により、導入されたモジュールの有効性が検証され、アプローチの有効性が確固たるものになります。
ソースコードは https://github.com/zhuduowang/ChangeViT で入手できます。

要約(オリジナル)

Change detection in remote sensing images is essential for tracking environmental changes on the Earth’s surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper, our study uncovers ViTs’ unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach. The source code is available at https://github.com/zhuduowang/ChangeViT.

arxiv情報

著者	Duowang Zhu,Xiaohu Huang,Haiyan Huang,Zhenfeng Shao,Qimin Cheng
発行日	2024-06-18 17:59:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ChangeViT: Unleashing Plain Vision Transformers for Change Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー