Remote Sensing Scene Classification with Masked Image Modeling (MIM)

要約

リモートセンシングシーンの分類は、地質調査、石油探査、交通管理、地震予測、山火事の監視、および情報監視における重要な役割について広く研究されています。
これまで、タスクを実行するための機械学習 (ML) メソッドは、主に教師あり学習 (SL) の方法で事前トレーニングされたバックボーンを使用していました。
自己教師あり学習 (SSL) 手法であるマスクイメージモデリング (MIM) は、視覚的特徴表現を学習するためのより良い方法として示されているため、シーン分類タスクで ML パフォーマンスを向上させる新しい機会を提供します。
この研究の目的は、Merced、AID、NWPU-RESISC45、および Optimal-31 の 4 つのよく知られた分類データセットで MIM 事前トレーニング済みバックボーンの可能性を探ることです。
公開されたベンチマークと比較して、MIM の事前トレーニング済みのビジョントランスフォーマー (ViTs) バックボーンが他の代替手段よりも優れていること (トップ 1 の精度で最大 18%)、および MIM 手法が教師あり学習の対応するものよりも優れた特徴表現を学習できること (最大 5
上位 1 精度の %)。
さらに、汎用の MIM 事前トレーニング済み ViT が、特別に設計された複雑な Transformer for Remote Sensing (TRS) フレームワークとして競争力のあるパフォーマンスを達成できることを示します。
私たちの実験結果は、将来の研究のためのパフォーマンスのベースラインも提供します。

要約(オリジナル)

Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction, wildfire monitoring, and intelligence monitoring. In the past, the Machine Learning (ML) methods for performing the task mainly used the backbones pretrained in the manner of supervised learning (SL). As Masked Image Modeling (MIM), a self-supervised learning (SSL) technique, has been shown as a better way for learning visual feature representation, it presents a new opportunity for improving ML performance on the scene classification task. This research aims to explore the potential of MIM pretrained backbones on four well-known classification datasets: Merced, AID, NWPU-RESISC45, and Optimal-31. Compared to the published benchmarks, we show that the MIM pretrained Vision Transformer (ViTs) backbones outperform other alternatives (up to 18% on top 1 accuracy) and that the MIM technique can learn better feature representation than the supervised learning counterparts (up to 5% on top 1 accuracy). Moreover, we show that the general-purpose MIM-pretrained ViTs can achieve competitive performance as the specially designed yet complicated Transformer for Remote Sensing (TRS) framework. Our experiment results also provide a performance baseline for future studies.

arxiv情報

著者	Liya Wang,Alex Tien
発行日	2023-03-24 17:43:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Remote Sensing Scene Classification with Masked Image Modeling (MIM)

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー