Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense

要約

タイトル：解釈可能性とは安全性の一種：Adversary Defense用のInterpreter-based Ensemble

要約:

– DNNモデルは現実の様々なアプリケーションで大きな成功を収めたが、反対者の攻撃に対する脆弱性が指摘され続けている。これに対して様々な手法が提唱されてきたが、adversarial exampleの本質はまだ明確ではなく、既存の方法の多くがハイブリッド攻撃やカウンター攻撃に脆弱である。
– そこで、この論文では、感度解析に基づくDNNインタプリタとadversarial exampleの生成プロセスとの間に勾配ベースの相関関係があることを最初に明らかにし、adversarial attackのAchilles’ heelを示し、DNNの2つの長年の課題（脆弱性と解釈不能性）を結びつける手がかりを提供する。
– 次に、X-EnsembleというInterpreter-based Ensembleフレームワークを提案する。X-Ensembleは、複数のサブ検出器とターゲット分類器に対する様々なタイプの解釈情報を背景として検出器と整流器を構築する新しい検出 – 整流プロセスを採用している。さらに、X-Ensembleは、ランダムフォレスト（RF）モデルを使用して、サブ検出器をアンサンブル検出器に統合してadversarial hybrid attacksに対して防御を行っている。RFの非微分性が、対抗攻撃に対して貴重な選択肢になる。
– 多様な攻撃シナリオでの広範な実験により、X-Ensembleの競合ベースライン手法に対する優位性が示された。

要約(オリジナル)

While having achieved great success in rich real-life applications, deep neural network (DNN) models have long been criticized for their vulnerability to adversarial attacks. Tremendous research efforts have been dedicated to mitigating the threats of adversarial attacks, but the essential trait of adversarial examples is not yet clear, and most existing methods are yet vulnerable to hybrid attacks and suffer from counterattacks. In light of this, in this paper, we first reveal a gradient-based correlation between sensitivity analysis-based DNN interpreters and the generation process of adversarial examples, which indicates the Achilles’s heel of adversarial attacks and sheds light on linking together the two long-standing challenges of DNN: fragility and unexplainability. We then propose an interpreter-based ensemble framework called X-Ensemble for robust adversary defense. X-Ensemble adopts a novel detection-rectification process and features in building multiple sub-detectors and a rectifier upon various types of interpretation information toward target classifiers. Moreover, X-Ensemble employs the Random Forests (RF) model to combine sub-detectors into an ensemble detector for adversarial hybrid attacks defense. The non-differentiable property of RF further makes it a precious choice against the counterattack of adversaries. Extensive experiments under various types of state-of-the-art attacks and diverse attack scenarios demonstrate the advantages of X-Ensemble to competitive baseline methods.

arxiv情報

著者	Jingyuan Wang,Yufan Wu,Mingxuan Li,Xin Lin,Junjie Wu,Chao Li
発行日	2023-04-14 04:32:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー