Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence

要約

敵対的攻撃に対するディープラーニングモデルの脆弱性は、特にモデルがセキュリティクリティカルなドメインに展開されている場合にますます注目を集めています。
モデルの堅牢性を向上させるために、事後対応型および事前対応型を含む数多くの防御方法が提案されています。
摂動を除去するための変換の実行などの事後防御は、通常、大きな摂動に対処できません。
再トレーニングを伴うプロアクティブな防御には、攻撃への依存性と高い計算コストという問題があります。
本稿では、モデル内のニューロンを攻撃する敵対的攻撃の一般的な影響から防御方法を検討します。
正しい分類に対するニューロンの貢献を定量的に測定できる、ニューロンの影響の概念を導入します。
次に、ほぼすべての攻撃が、より大きな影響を持つニューロンを抑制し、より小さな影響を持つニューロンを強化することによってモデルを欺いていることが観察されます。
これに基づいて、一般的な敵対的攻撃に対する新しい防御策である \emph{ニューロンレベルの逆摂動} (NIP) を提案します。
良性のサンプルからニューロンの影響を計算し、逆摂動を生成することで入力サンプルを変更します。これにより、より大きな影響を持つニューロンを強化し、より小さな影響を持つニューロンを弱めることができます。

要約(オリジナル)

The vulnerabilities of deep learning models towards adversarial attacks have attracted increasing attention, especially when models are deployed in security-critical domains. Numerous defense methods, including reactive and proactive ones, have been proposed for model robustness improvement. Reactive defenses, such as conducting transformations to remove perturbations, usually fail to handle large perturbations. The proactive defenses that involve retraining, suffer from the attack dependency and high computation cost. In this paper, we consider defense methods from the general effect of adversarial attacks that take on neurons inside the model. We introduce the concept of neuron influence, which can quantitatively measure neurons’ contribution to correct classification. Then, we observe that almost all attacks fool the model by suppressing neurons with larger influence and enhancing those with smaller influence. Based on this, we propose \emph{Neuron-level Inverse Perturbation} (NIP), a novel defense against general adversarial attacks. It calculates neuron influence from benign examples and then modifies input examples by generating inverse perturbations that can in turn strengthen neurons with larger influence and weaken those with smaller influence.

arxiv情報

著者	Ruoxi Chen,Haibo Jin,Haibin Zheng,Jinyin Chen,Zhenguang Liu
発行日	2024-08-19 17:21:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー