Towards more transferable adversarial attack in black-box manner

要約

敵対的な攻撃は十分に標準のドメインになり、モデルの堅牢性の評価ベースラインとして頻繁に機能します。
これらの中で、移転可能性に基づいたブラックボックス攻撃は、実際のシナリオでの実際的な適用性により、大きな注目を集めています。
従来のブラックボックス方法は、一般に、サロゲートホワイトボックスモデルアーキテクチャへの依存度を調べるのではなく、移転性を高めるために最適化フレームワーク（Mi-FGSMの勢いを利用するなど）の改善に焦点を当てています。
最近の最先端のアプローチDIFFPGDは、適応攻撃のために拡散ベースの敵対的浄化モデルを採用することにより、移転可能性の向上を実証しています。
拡散ベースの敵対的浄化の帰納的バイアスは、騒音の添加を含む敵対的な攻撃プロセスと自然に整合し、代理ホワイトボックスモデルの選択への依存度を低下させます。
ただし、拡散モデルの除去プロセスは、チェーンルールの導出を通じてかなりの計算コストを負い、過剰なVRAM消費と延長ランタイムで現れます。
この進行により、拡散モデルの導入が必要かどうかを疑問視するようになります。
拡散ベースの敵対的浄化と同様の誘導バイアスを適切な損失関数と組み合わせて共有するモデルは、計算オーバーヘッドを劇的に減少させながら、同等または優れた移動性を実現できると仮定します。
この論文では、仮説を検証するためのユニークな代理モデルと組み合わせた新しい損失関数を提案します。
当社のアプローチは、分類器誘導拡散モデルからの時間依存分類器のスコアを活用し、自然データ分布の知識を敵対的最適化プロセスに効果的に組み込みます。
実験結果は、拡散ベースの防御に対する堅牢性を維持しながら、多様なモデルアーキテクチャ間の譲渡可能性が大幅に改善されたことを示しています。

要約(オリジナル)

Adversarial attacks have become a well-explored domain, frequently serving as evaluation baselines for model robustness. Among these, black-box attacks based on transferability have received significant attention due to their practical applicability in real-world scenarios. Traditional black-box methods have generally focused on improving the optimization framework (e.g., utilizing momentum in MI-FGSM) to enhance transferability, rather than examining the dependency on surrogate white-box model architectures. Recent state-of-the-art approach DiffPGD has demonstrated enhanced transferability by employing diffusion-based adversarial purification models for adaptive attacks. The inductive bias of diffusion-based adversarial purification aligns naturally with the adversarial attack process, where both involving noise addition, reducing dependency on surrogate white-box model selection. However, the denoising process of diffusion models incurs substantial computational costs through chain rule derivation, manifested in excessive VRAM consumption and extended runtime. This progression prompts us to question whether introducing diffusion models is necessary. We hypothesize that a model sharing similar inductive bias to diffusion-based adversarial purification, combined with an appropriate loss function, could achieve comparable or superior transferability while dramatically reducing computational overhead. In this paper, we propose a novel loss function coupled with a unique surrogate model to validate our hypothesis. Our approach leverages the score of the time-dependent classifier from classifier-guided diffusion models, effectively incorporating natural data distribution knowledge into the adversarial optimization process. Experimental results demonstrate significantly improved transferability across diverse model architectures while maintaining robustness against diffusion-based defenses.

arxiv情報

著者	Chun Tong Lei,Zhongliang Guo,Hon Chung Lee,Minh Quoc Duong,Chun Pong Lau
発行日	2025-05-23 16:49:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards more transferable adversarial attack in black-box manner

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー