LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning

要約

ディープニューラルネットワークはバックドア攻撃に対して脆弱です。
既存のバックドア防御手法の中で、最適化を通じてバックドアトリガーを再構築するトリガーリバースエンジニアリングベースのアプローチは、他の種類の手法と比較して最も汎用性があり効果的です。
この論文では、典型的なトリガーリバースエンジニアリングプロセスの一般的なパラダイムを要約し、構築します。
このパラダイムに基づいて、バックドアサンプルの分類信頼度を操作することでトリガーリバースエンジニアリングを阻止する新しい観点を提案します。
分類信頼度の具体的な修正を決定するために、修正の下限を計算するための補償モデルを提案します。
適切に変更を加えれば、バックドア攻撃はトリガーリバースエンジニアリングベースの手法を簡単に回避できます。
この目的を達成するために、ラベルスムージングポイズニング (LSP) フレームワークを提案します。このフレームワークは、ラベルスムージングを利用してバックドアサンプルの分類信頼度を特別に操作します。
広範な実験により、提案された作業が最先端のトリガーリバースエンジニアリングベースの手法を打ち破ることができ、既存のさまざまなバックドア攻撃と良好な互換性を備えていることが実証されました。

要約(オリジナル)

Deep neural networks are vulnerable to backdoor attacks. Among the existing backdoor defense methods, trigger reverse engineering based approaches, which reconstruct the backdoor triggers via optimizations, are the most versatile and effective ones compared to other types of methods. In this paper, we summarize and construct a generic paradigm for the typical trigger reverse engineering process. Based on this paradigm, we propose a new perspective to defeat trigger reverse engineering by manipulating the classification confidence of backdoor samples. To determine the specific modifications of classification confidence, we propose a compensatory model to compute the lower bound of the modification. With proper modifications, the backdoor attack can easily bypass the trigger reverse engineering based methods. To achieve this objective, we propose a Label Smoothing Poisoning (LSP) framework, which leverages label smoothing to specifically manipulate the classification confidences of backdoor samples. Extensive experiments demonstrate that the proposed work can defeat the state-of-the-art trigger reverse engineering based methods, and possess good compatibility with a variety of existing backdoor attacks.

arxiv情報

著者	Beichen Li,Yuanfang Guo,Heqi Peng,Yangxi Li,Yunhong Wang
発行日	2024-04-19 12:42:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー