A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification

要約

機械学習システムは、攻撃者がデータの改ざんやアーキテクチャの変更を通じてモデルの動作を操作するバックドア攻撃に対して脆弱です。
従来のバックドア攻撃では、特定のトリガーを備えた悪意のあるサンプルをトレーニングデータに挿入し、対応するトリガーが存在する場合にモデルがターゲットを絞った誤った出力を生成するようにします。
より高度な攻撃はモデルのアーキテクチャを直接変更し、従来のデータベースの検出方法を回避するため検出が困難なバックドアを埋め込みます。
ただし、アーキテクチャ変更ベースのバックドア攻撃の欠点は、バックドアをアクティブにするためにトリガーが可視である必要があることです。
バックドア攻撃の可視性をさらに強化するために、この論文では新しいバックドア攻撃方法が紹介されています。
より具体的には、この方法はモデルのアーキテクチャ内にバックドアを埋め込み、目立たないステルストリガーを生成する機能を備えています。
この攻撃は、事前トレーニングされたモデルを変更することによって実行され、その後再配布されるため、何も疑っていないユーザーに潜在的な脅威を与えます。
標準的なコンピュータビジョンベンチマークで行われた包括的な実験により、この攻撃の有効性が検証され、手動の目視検査と高度な検出ツールの両方によっても依然として検出できないそのトリガーのステルス性が強調されています。

要約(オリジナル)

Machine learning systems are vulnerable to backdoor attacks, where attackers manipulate model behavior through data tampering or architectural modifications. Traditional backdoor attacks involve injecting malicious samples with specific triggers into the training data, causing the model to produce targeted incorrect outputs in the presence of the corresponding triggers. More sophisticated attacks modify the model’s architecture directly, embedding backdoors that are harder to detect as they evade traditional data-based detection methods. However, the drawback of the architectural modification based backdoor attacks is that the trigger must be visible in order to activate the backdoor. To further strengthen the invisibility of the backdoor attacks, a novel backdoor attack method is presented in the paper. To be more specific, this method embeds the backdoor within the model’s architecture and has the capability to generate inconspicuous and stealthy triggers. The attack is implemented by modifying pre-trained models, which are then redistributed, thereby posing a potential threat to unsuspecting users. Comprehensive experiments conducted on standard computer vision benchmarks validate the effectiveness of this attack and highlight the stealthiness of its triggers, which remain undetectable through both manual visual inspection and advanced detection tools.

arxiv情報

著者	Yuan Ma,Xu Ma,Jiankang Wei,Jinmeng Tang,Xiaoyu Zhang,Yilun Lyu,Kehao Chen,Jingtong Huang
発行日	2025-01-06 14:42:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー