Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos

要約

敵対的な訓練（AT）は、MIN-MAX最適化アプローチを介して敵対的な堅牢性を大幅に高めることが示されています。
ただし、ビデオ認識タスクにおけるその有効性は、2つの主要な課題によって妨げられています。
第一に、ビデオモデルのための速い敵対的なトレーニングはほとんど未開拓のままであり、その実用的なアプリケーションを激しく妨げます。
具体的には、ほとんどのビデオ敵対的なトレーニング方法は、長いトレーニング時間と高い費用を備えた計算的にコストがかかります。
第二に、既存の方法は、きれいな精度と敵対的な堅牢性とのトレードオフと闘っています。
これらの課題に対処するために、ビデオデータのための最初の高速敵対的トレーニング方法である弱点から強さの一貫性（VFAT-WS）を備えたビデオの高速敵対的トレーニングを導入します。
具体的には、VFAT-WSには、次の重要な設計が組み込まれています。まず、簡単でありながら効果的な時間周波数増加（TF-8月）とその空間的に強化されたフォームSTF-8月を統合し、トレーニング効率と堅牢性を高めるための単一ステップPGD攻撃を統合します。
第二に、それは、よりシンプルなTF-8月とより複雑なSTF-8月をシームレスに統合する、弱くて強力な空間的一貫性の正則化を考案します。
一貫性の正則化を活用すると、学習プロセスを単純なものから複雑な増強に導きます。
どちらも協力して、きれいな精度と堅牢性の間のより良いトレードオフを実現します。
CNNとトランスベースのモデルの両方を使用したUCF-101およびHMDB-51の広範な実験は、VFAT-WSが敵対的な堅牢性と腐敗の堅牢性を大幅に改善し、トレーニングをほぼ490％加速することを示しています。

要約(オリジナル)

Adversarial Training (AT) has been shown to significantly enhance adversarial robustness via a min-max optimization approach. However, its effectiveness in video recognition tasks is hampered by two main challenges. First, fast adversarial training for video models remains largely unexplored, which severely impedes its practical applications. Specifically, most video adversarial training methods are computationally costly, with long training times and high expenses. Second, existing methods struggle with the trade-off between clean accuracy and adversarial robustness. To address these challenges, we introduce Video Fast Adversarial Training with Weak-to-Strong consistency (VFAT-WS), the first fast adversarial training method for video data. Specifically, VFAT-WS incorporates the following key designs: First, it integrates a straightforward yet effective temporal frequency augmentation (TF-AUG), and its spatial-temporal enhanced form STF-AUG, along with a single-step PGD attack to boost training efficiency and robustness. Second, it devises a weak-to-strong spatial-temporal consistency regularization, which seamlessly integrates the simpler TF-AUG and the more complex STF-AUG. Leveraging the consistency regularization, it steers the learning process from simple to complex augmentations. Both of them work together to achieve a better trade-off between clean accuracy and robustness. Extensive experiments on UCF-101 and HMDB-51 with both CNN and Transformer-based models demonstrate that VFAT-WS achieves great improvements in adversarial robustness and corruption robustness, while accelerating training by nearly 490%.

arxiv情報

著者	Songping Wang,Hanqing Liu,Yueming Lyu,Xiantao Hu,Ziwen He,Wei Wang,Caifeng Shan,Liang Wang
発行日	2025-04-23 13:22:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー