Detecting and Recovering Adversarial Examples from Extracting Non-robust and Highly Predictive Adversarial Perturbations

要約

ディープニューラルネットワーク (DNN) は、ターゲットモデルを欺くために悪意を持って設計された敵対的例 (AE) に対して脆弱であることが示されています。
目に見えない敵対的摂動で追加された通常の例 (NE) は、DNN に対するセキュリティ上の脅威になる可能性があります。
既存の AE 検出方法は高い精度を達成しましたが、検出された AE の情報を活用できませんでした。
したがって、高次元摂動抽出に基づいて、モデルフリーの AE 検出方法を提案します。
調査によると、DNN は高次元の特徴に敏感です。
敵対的な例に隠れている敵対的な摂動は、高度に予測可能でロバストでない高次元の特徴に属します。
DNN は、高次元データから他のものよりも多くの詳細を学習します。
私たちの方法では、摂動抽出器は AE から敵対的摂動を高次元の特徴として抽出することができ、訓練された AE ディスクリミネーターは入力が AE であるかどうかを判断します。
実験結果は、提案された方法が敵対的な例を高精度で検出できるだけでなく、特定のカテゴリの AE も検出できることを示しています。
一方、抽出された摂動は、AE を NE に復元するために使用できます。

要約(オリジナル)

Deep neural networks (DNNs) have been shown to be vulnerable against adversarial examples (AEs) which are maliciously designed to fool target models. The normal examples (NEs) added with imperceptible adversarial perturbation, can be a security threat to DNNs. Although the existing AEs detection methods have achieved a high accuracy, they failed to exploit the information of the AEs detected. Thus, based on high-dimension perturbation extraction, we propose a model-free AEs detection method, the whole process of which is free from querying the victim model. Research shows that DNNs are sensitive to the high-dimension features. The adversarial perturbation hiding in the adversarial example belongs to the high-dimension feature which is highly predictive and non-robust. DNNs learn more details from high-dimension data than others. In our method, the perturbation extractor can extract the adversarial perturbation from AEs as high-dimension feature, then the trained AEs discriminator determines whether the input is an AE. Experimental results show that the proposed method can not only detect the adversarial examples with high accuracy, but also detect the specific category of the AEs. Meanwhile, the extracted perturbation can be used to recover the AEs to NEs.

arxiv情報

著者	Mingyu Dong,Jiahao Chen,Diqun Yan,Jingxing Gao,Li Dong,Rangding Wang
発行日	2022-08-30 14:17:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Detecting and Recovering Adversarial Examples from Extracting Non-robust and Highly Predictive Adversarial Perturbations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー