Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

要約

敵対的トレーニングは、敵対的例 (AE) から防御するための最先端のアプローチですが、堅牢性と精度のトレードオフに悩まされています。
この研究では、このトレードオフを軽減することを目的として、識別的でありながら敵対的に不変な表現を学習するために表現ベースの不変正則化を再検討します。
我々は、不変性の正則化を妨げる 2 つの重要な問題を経験的に特定します。(1) 不変性の損失と分類目標の間の「勾配の衝突」は、「崩壊する解」の存在を示します。(2) クリーンと敵対的なものの分散した分布から生じる混合分布問題
入力。
これらの問題に対処するために、私たちは非対称表現正則化敵対的トレーニング (AR-AT) を提案します。これは、最近の非対照的な自己に触発された、「崩壊する解決策」を回避するために、不変性損失に停止勾配操作と予測子を組み込んでいます。
-教師あり学習アプローチと、混合分布問題を解決するための Split-BatchNorm (BN) 構造。
私たちの方法は、識別力を犠牲にすることなく敵対的不変表現を学習することにより、ロバスト性と精度のトレードオフを大幅に改善します。
さらに、我々の発見と知識蒸留ベースの防御手法との関連性について議論し、それらの相対的な成功についてのより深い理解に貢献します。

要約(オリジナル)

Although adversarial training has been the state-of-the-art approach to defend against adversarial examples (AEs), they suffer from a robustness-accuracy trade-off. In this work, we revisit representation-based invariance regularization to learn discriminative yet adversarially invariant representations, aiming to mitigate this trade-off. We empirically identify two key issues hindering invariance regularization: (1) a ‘gradient conflict’ between invariance loss and classification objectives, indicating the existence of ‘collapsing solutions,’ and (2) the mixture distribution problem arising from diverged distributions of clean and adversarial inputs. To address these issues, we propose Asymmetrically Representation-regularized Adversarial Training (AR-AT), which incorporates a stop-gradient operation and a pre-dictor in the invariance loss to avoid ‘collapsing solutions,’ inspired by a recent non-contrastive self-supervised learning approach, and a split-BatchNorm (BN) structure to resolve the mixture distribution problem. Our method significantly improves the robustness-accuracy trade-off by learning adversarially invariant representations without sacrificing discriminative power. Furthermore, we discuss the relevance of our findings to knowledge-distillation-based defense methods, contributing to a deeper understanding of their relative successes.

arxiv情報

著者	Futa Waseda,Isao Echizen
発行日	2024-02-22 15:53:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー