Defending Deep Neural Networks against Backdoor Attacks via Module Switching

要約

ディープニューラルネットワーク（DNNS）のパラメーターの指数関数的な増加により、特にリソースが制約したエンティティにとって、独立したトレーニングのコストが大幅に引き上げられました。
その結果、オープンソースモデルへの依存度が高まっています。
しかし、トレーニングプロセスの不透明度はセキュリティリスクを悪化させ、これらのモデルをバックドア攻撃などの悪意のある脅威に対してより脆弱にし、同時に防御メカニズムを複雑にします。
均一なモデルの融合は、費用対効果の高いトレーニング後の防御として注目を集めています。
ただし、重量平均などの既存の戦略は、毒パラメーターの影響を部分的に軽減し、モデルパラメーターに埋め込まれた広範な偽の相関を破壊するのに効果がないことに気付きます。
モデルの伝播パス内でそのような偽の相関を破るための新しいモジュールスイッチング戦略を提案します。
融合戦略を最適化するために進化的アルゴリズムを活用することにより、テキストとビジョンドメインをターゲットとするバックドア攻撃に対するアプローチを検証します。
私たちの方法は、いくつかの侵害されたモデルを組み込んだ場合でも、効果的なバックドア緩和を実現します。たとえば、SST-2で最高のパフォーマンスを発揮するベースラインで、平均攻撃成功率（ASR）を31.9％と比較して22％に減らします。

要約(オリジナル)

The exponential increase in the parameters of Deep Neural Networks (DNNs) has significantly raised the cost of independent training, particularly for resource-constrained entities. As a result, there is a growing reliance on open-source models. However, the opacity of training processes exacerbates security risks, making these models more vulnerable to malicious threats, such as backdoor attacks, while simultaneously complicating defense mechanisms. Merging homogeneous models has gained attention as a cost-effective post-training defense. However, we notice that existing strategies, such as weight averaging, only partially mitigate the influence of poisoned parameters and remain ineffective in disrupting the pervasive spurious correlations embedded across model parameters. We propose a novel module-switching strategy to break such spurious correlations within the model’s propagation path. By leveraging evolutionary algorithms to optimize fusion strategies, we validate our approach against backdoor attacks targeting text and vision domains. Our method achieves effective backdoor mitigation even when incorporating a couple of compromised models, e.g., reducing the average attack success rate (ASR) to 22% compared to 31.9% with the best-performing baseline on SST-2.

arxiv情報

著者	Weijun Li,Ansh Arora,Xuanli He,Mark Dras,Qiongkai Xu
発行日	2025-04-08 11:01:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Defending Deep Neural Networks against Backdoor Attacks via Module Switching

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー