Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class

要約

近年、機械学習モデルはバックドア攻撃に対して脆弱であることが示されています。
このような攻撃の下で、敵対者は、侵害されたモデルがクリーンな入力で正常に動作するように、訓練されたモデルにステルスバックドアを埋め込みますが、悪意を持って構築された入力に対する敵対者の制御に従って、トリガーで誤分類します。
これらの既存の攻撃は非常に効果的ですが、攻撃者の能力は限られています。入力が与えられた場合、これらの攻撃はモデルを単一の定義済みクラスまたはターゲットクラスに誤分類させるだけです。
対照的に、このホワイトペーパーでは、Marksman と呼ばれるはるかに強力なペイロードを使用した新しいバックドア攻撃を悪用します。攻撃者は、推論中に入力が与えられた場合にモデルが誤分類するターゲットクラスを任意に選択できます。
この目標を達成するために、トリガー関数をクラス条件付き生成モデルとして表現し、制約付き最適化フレームワークにバックドアを挿入することを提案します。これにより、トリガー関数は、任意のターゲットクラスを自由に攻撃する最適なトリガーパターンを生成することを学習します。
この生成バックドアをトレーニング済みモデルに埋め込みます。
学習したトリガー生成関数を使用すると、推論中に敵対者は任意のバックドア攻撃ターゲットクラスを指定でき、それに応じて、モデルをこのターゲットクラスに分類させる適切なトリガーが作成されます。
MNIST、CIFAR10、GTSRB、および TinyImageNet を含むいくつかのベンチマークデータセットでクリーンデータのパフォーマンスを維持しながら、提案されたフレームワークが高い攻撃パフォーマンスを達成することを経験的に示します。
提案された Marksman バックドア攻撃は、単一のターゲットクラスを使用したバックドア攻撃に対して元々設計された既存のバックドア防御を簡単にバイパスすることもできます。
私たちの研究は、実際のバックドア攻撃の広範なリスクを理解するための重要な一歩を踏み出しました。

要約(オリジナル)

In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. Under such attacks, an adversary embeds a stealthy backdoor into the trained model such that the compromised models will behave normally on clean inputs but will misclassify according to the adversary’s control on maliciously constructed input with a trigger. While these existing attacks are very effective, the adversary’s capability is limited: given an input, these attacks can only cause the model to misclassify toward a single pre-defined or target class. In contrast, this paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman, where the adversary can arbitrarily choose which target class the model will misclassify given any input during inference. To achieve this goal, we propose to represent the trigger function as a class-conditional generative model and to inject the backdoor in a constrained optimization framework, where the trigger function learns to generate an optimal trigger pattern to attack any target class at will while simultaneously embedding this generative backdoor into the trained model. Given the learned trigger-generation function, during inference, the adversary can specify an arbitrary backdoor attack target class, and an appropriate trigger causing the model to classify toward this target class is created accordingly. We show empirically that the proposed framework achieves high attack performance while preserving the clean-data performance in several benchmark datasets, including MNIST, CIFAR10, GTSRB, and TinyImageNet. The proposed Marksman backdoor attack can also easily bypass existing backdoor defenses that were originally designed against backdoor attacks with a single target class. Our work takes another significant step toward understanding the extensive risks of backdoor attacks in practice.

arxiv情報

著者	Khoa D. Doan,Yingjie Lao,Ping Li
発行日	2022-10-17 15:46:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー