Towards Efficient Adversarial Training on Vision Transformers

要約

畳み込みニューラルネットワーク（CNN）の強力な代替手段としてのVision Transformer（ViT）は、多くの注目を集めています。
最近の研究では、ViTはCNNのような敵対的な例に対しても脆弱であることが示されました。
堅牢なViTを構築するには、堅牢なCNNを実現するための最も効果的な方法の1つとして示されているため、直感的な方法は敵対的なトレーニングを適用することです。
ただし、敵対訓練の主な制限の1つは、計算コストが高いことです。
ViTで採用されている自己注意メカニズムは、計算量の多い操作であり、入力パッチの数に応じて費用が2倍に増加するため、ViTでの敵対的なトレーニングにさらに時間がかかります。
この作業では、最初にさまざまなビジョントランスフォーマーに関する高速な敵対者トレーニングを包括的に研究し、効率と堅牢性の関係を示します。
次に、ViTに関する敵対的訓練を説明するために、効率的な注意誘導敵対的訓練メカニズムを提案します。
具体的には、自己注意の専門性に依存して、敵対者のトレーニング中に注意誘導ドロップ戦略を使用して、各レイヤーの特定のパッチ埋め込みを積極的に削除します。
スリム化された自己注意モジュールは、ViTの敵対的訓練を大幅に加速します。
敵対的な高速トレーニング時間のわずか65\％で、挑戦的なImageNetベンチマークの最先端の結果と一致します。

要約(オリジナル)

Vision Transformer (ViT), as a powerful alternative to Convolutional Neural Network (CNN), has received much attention. Recent work showed that ViTs are also vulnerable to adversarial examples like CNNs. To build robust ViTs, an intuitive way is to apply adversarial training since it has been shown as one of the most effective ways to accomplish robust CNNs. However, one major limitation of adversarial training is its heavy computational cost. The self-attention mechanism adopted by ViTs is a computationally intense operation whose expense increases quadratically with the number of input patches, making adversarial training on ViTs even more time-consuming. In this work, we first comprehensively study fast adversarial training on a variety of vision transformers and illustrate the relationship between the efficiency and robustness. Then, to expediate adversarial training on ViTs, we propose an efficient Attention Guided Adversarial Training mechanism. Specifically, relying on the specialty of self-attention, we actively remove certain patch embeddings of each layer with an attention-guided dropping strategy during adversarial training. The slimmed self-attention modules accelerate the adversarial training on ViTs significantly. With only 65\% of the fast adversarial training time, we match the state-of-the-art results on the challenging ImageNet benchmark.

arxiv情報

著者	Boxi Wu,Jindong Gu,Zhifeng Li,Deng Cai,Xiaofei He,Wei Liu
発行日	2022-07-21 14:23:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Efficient Adversarial Training on Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー