Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

要約

敵対的な訓練は、深いモデルに対する攻撃を軽減するために広く探求されてきました。
ただし、ほとんどの既存の作品は、堅牢ではないが予測性の高い機能を無視しながら、モデルを堅牢な機能（敵に簡単に改ざんされない）に適合させる傾向があるため、精度と堅牢性の間にあるジレンマに陥っています。
より優れたロバスト性と精度のトレードオフを実現するために、事前にトレーニングされたモデル（高精度に最適化）から知識蒸留を実行して敵対者のトレーニングをより高い精度に導く、バニラ機能蒸留敵対トレーニング（VFD-Adv）を提案します。
、これらの堅牢ではないが予測機能を保持します。
より具体的には、敵対的な例とそれらのクリーンな対応物の両方が、事前にトレーニングされた/クリーンなモデルから予測表現を抽出することによって機能空間で整列することを余儀なくされますが、以前の作品はクリーンなモデルからの予測機能をほとんど利用しません。
したがって、敵対的な訓練モデルは、堅牢性を獲得する際に精度を最大限に維持するように更新されます。
私たちの方法の主な利点は、既存の作業に普遍的に適応して後押しできることです。
さまざまなデータセット、分類モデル、および敵対的なトレーニングアルゴリズムに関する徹底的な実験により、提案された方法の有効性が実証されています。

要約(オリジナル)

Adversarial training has been widely explored for mitigating attacks against deep models. However, most existing works are still trapped in the dilemma between higher accuracy and stronger robustness since they tend to fit a model towards robust features (not easily tampered with by adversaries) while ignoring those non-robust but highly predictive features. To achieve a better robustness-accuracy trade-off, we propose the Vanilla Feature Distillation Adversarial Training (VFD-Adv), which conducts knowledge distillation from a pre-trained model (optimized towards high accuracy) to guide adversarial training towards higher accuracy, i.e., preserving those non-robust but predictive features. More specifically, both adversarial examples and their clean counterparts are forced to be aligned in the feature space by distilling predictive representations from the pre-trained/clean model, while previous works barely utilize predictive features from clean models. Therefore, the adversarial training model is updated towards maximally preserving the accuracy as gaining robustness. A key advantage of our method is that it can be universally adapted to and boost existing works. Exhaustive experiments on various datasets, classification models, and adversarial training algorithms demonstrate the effectiveness of our proposed method.

arxiv情報

著者	Guodong Cao,Zhibo Wang,Xiaowei Dong,Zhifei Zhang,Hengchang Guo,Zhan Qin,Kui Ren
発行日	2022-06-05 11:57:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー