XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

要約

人工知能 (AI) システムの透明性と信頼性を高めるという大きな利点があるにもかかわらず、説明可能な AI (XAI) は、現実世界のアプリケーションではまだその可能性を最大限に発揮していません。
重要な課題の 1 つは、XAI が意図せずに敵対者にブラックボックスモデルに関する洞察を提供し、さまざまな攻撃に対する脆弱性が必然的に増大する可能性があることです。
この論文では、XSub と呼ばれる、特徴置換に基づくブラックボックス分類子に対する新しい説明主導型の敵対的攻撃を開発します。
XSub の重要なアイデアは、元のサンプルの重要な特徴 (XAI によって識別される) を、別のラベルの「ゴールデンサンプル」からの対応する重要な特徴で戦略的に置き換えることです。これにより、モデルが摂動サンプルを誤分類する可能性が高まります。
特徴の置換の程度は調整可能で、元のサンプル情報をどの程度置換するかを制御できます。
この柔軟性により、攻撃の有効性とステルス性の間のトレードオフのバランスが効果的に保たれます。
XSub は、攻撃を実行する際に予測モデルと説明モデルに必要なクエリの数が O(1) であるため、コスト効率も高くなります。
さらに、XSub は、攻撃者がモデルのトレーニングデータにアクセスした場合に備えて、バックドア攻撃を開始するように簡単に拡張できます。
私たちの評価は、XSub が効果的でステルスであるだけでなく、コスト効率も高く、幅広い AI モデルに適用できることを示しています。

要約(オリジナル)

Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) has yet to reach its full potential in real-world applications. One key challenge is that XAI can unintentionally provide adversaries with insights into black-box models, inevitably increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against black-box classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features from a ‘golden sample’ of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. The degree of feature substitution is adjustable, allowing us to control how much of the original samples information is replaced. This flexibility effectively balances a trade-off between the attacks effectiveness and its stealthiness. XSub is also highly cost-effective in that the number of required queries to the prediction model and the explanation model in conducting the attack is in O(1). In addition, XSub can be easily extended to launch backdoor attacks in case the attacker has access to the models training data. Our evaluation demonstrates that XSub is not only effective and stealthy but also cost-effective, enabling its application across a wide range of AI models.

arxiv情報

著者	Kiana Vu,Phung Lai,Truc Nguyen
発行日	2024-09-13 15:33:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー