CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

要約

ニューラル分類子を敵対的な攻撃から守るための継続的な取り組みにもかかわらず、特に目に見えない攻撃に対しては脆弱なままです。
対照的に、人間は本質的な要素のみに基づいて判断するため、微妙な操作に騙されにくいです。
この観察に触発されて、私たちは、必須のラベル原因因子を使用してラベル生成をモデル化し、データ生成を支援するためにラベル非原因因子を組み込むことを試みます。
敵対的な例として、摂動を非原因因子として識別し、ラベルの原因因子のみに基づいて予測を行うことを目的としています。
具体的には、条件付きデータ生成に拡散モデルを適応させ、新たなカジュアル情報のボトルネック目標に向けて学習することで 2 種類のカジュアル要素を解きほぐすカジュアル拡散モデル (CausalDiff) を提案します。
経験的に、CausalDiff はさまざまな目に見えない攻撃に対して最先端の防御方法を大幅に上回り、CIFAR-10 では 86.39% (+4.01%)、CIFAR-100 では 56.25% (+3.13%) の平均堅牢性を達成しました。
GTSRB (ドイツの交通標識認識ベンチマーク) では 82.62% (+4.93%)。

要約(オリジナル)

Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark).

arxiv情報

著者	Mingkun Zhang,Keping Bi,Wei Chen,Quanrun Chen,Jiafeng Guo,Xueqi Cheng
発行日	2024-11-12 14:13:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー