Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

要約

この研究は、オンボード衛星処理における土地利用分類の最も効果的な事前訓練モデルを特定し、衛星ベースの推論中に一般的に遭遇する騒々しいデータ条件に対する高精度、計算効率、および堅牢性の達成を強調することに焦点を当てています。
広範な実験を通じて、従来のCNNベース、ResNetベース、およびさまざまな事前訓練を受けたビジョントランスモデルのパフォーマンスを比較します。
私たちの調査結果は、事前に訓練された視覚変圧器（VIT）モデル、特にMobileVitv2およびEfficientVit-M2が、精度と効率の観点からゼロからトレーニングされたモデルを上回ることを示しています。
これらのモデルは、計算要件を減らして高性能を達成し、騒々しい条件下での推論中により大きな回復力を示します。
MobileVitv2はクリーン検証データに優れていますが、効率的なVIT-M2はノイズを処理する際により堅牢であることが証明されており、オンボード衛星EOタスクに最適なモデルになりました。
私たちの実験結果は、衛星操作における信頼性の高い効率的なRS-ICに効率的なVIT-M2が最適な選択であり、精度、精度、およびリコールの98.76％を達成することを示しています。
正確には、EfficientVit-M2はすべてのメトリックで最高のパフォーマンスを提供し、トレーニング効率（1,000）と推論時間（10S）に優れており、より大きな堅牢性（全体の堅牢性スコア0.79）を示します。
その結果、EfficientVit-M2はMobileVitv2（79.23 W）よりも63.93％少ない電力を消費し、Swintransformer（108.90 W）よりも73.26％少ない電力を消費します。
これは、エネルギー効率におけるその大きな利点を強調しています。

要約(オリジナル)

This study focuses on identifying the most effective pre-trained model for land use classification in onboard satellite processing, emphasizing achieving high accuracy, computational efficiency, and robustness against noisy data conditions commonly encountered during satellite-based inference. Through extensive experimentation, we compare the performance of traditional CNN-based, ResNet-based, and various pre-trained vision Transformer models. Our findings demonstrate that pre-trained Vision Transformer (ViT) models, particularly MobileViTV2 and EfficientViT-M2, outperform models trained from scratch in terms of accuracy and efficiency. These models achieve high performance with reduced computational requirements and exhibit greater resilience during inference under noisy conditions. While MobileViTV2 has excelled on clean validation data, EfficientViT-M2 has proved more robust when handling noise, making it the most suitable model for onboard satellite EO tasks. Our experimental results demonstrate that EfficientViT-M2 is the optimal choice for reliable and efficient RS-IC in satellite operations, achieving 98.76 % of accuracy, precision, and recall. Precisely, EfficientViT-M2 delivers the highest performance across all metrics, excels in training efficiency (1,000s) and inference time (10s), and demonstrates greater robustness (overall robustness score of 0.79). Consequently, EfficientViT-M2 consumes 63.93 % less power than MobileViTV2 (79.23 W) and 73.26 % less power than SwinTransformer (108.90 W). This highlights its significant advantage in energy efficiency.

arxiv情報

著者	Thanh-Dung Le,Vu Nguyen Ha,Ti Ti Nguyen,Geoffrey Eappen,Prabhu Thiruvasagam,Hong-fu Chou,Duc-Dung Tran,Hung Nguyen-Kha,Luis M. Garces-Socarras,Jorge L. Gonzalez-Rios,Juan Carlos Merlano-Duncan,Symeon Chatzinotas
発行日	2025-04-22 14:51:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー