TrojFST: Embedding Trojans in Few-shot Prompt Tuning

要約

プロンプトチューニングは、限られた入力サンプルで新しい自然言語処理タスクを処理するために事前トレーニングされた言語モデル (PLM) を適応させるための非常に効果的なアプローチとして浮上しました。
しかし、プロンプトチューニングの成功により、攻撃者がこの手法に対してバックドア攻撃を試みるようになりました。
以前のプロンプトベースのバックドア攻撃は、数ショットのプロンプトチューニングによって実装されると課題に直面し、フルモデルの微調整か大規模なトレーニングデータセットが必要でした。
少数ショットプロンプトチューニングを使用してプロンプトベースのバックドアを構築するのは困難であることがわかりました。これには、PLM をフリーズし、制限された入力サンプルのセットを使用してソフトプロンプトをチューニングする必要があります。
このアプローチでは、不均衡な汚染されたデータセットが導入され、過剰適合が発生しやすくなり、注意の認識が欠如します。
これらの課題に対処するために、数ショットプロンプトチューニングのフレームワーク内でバックドア攻撃用の TrojFST を導入します。
TrojFST は、バランスドポイズニングラーニング、選択的トークンポイズニング、トロイの木馬トリガーアテンションの 3 つのモジュールで構成されています。
以前のプロンプトベースのバックドア攻撃と比較して、TrojFST は大幅な改善を示し、さまざまな PLM およびさまざまなダウンストリームタスクにわたって ASR $> 9\%$ および CDA を $> 4\%$ 強化しました。

要約(オリジナル)

Prompt-tuning has emerged as a highly effective approach for adapting a pre-trained language model (PLM) to handle new natural language processing tasks with limited input samples. However, the success of prompt-tuning has led to adversaries attempting backdoor attacks against this technique. Previous prompt-based backdoor attacks faced challenges when implemented through few-shot prompt-tuning, requiring either full-model fine-tuning or a large training dataset. We observe the difficulty in constructing a prompt-based backdoor using few-shot prompt-tuning, which involves freezing the PLM and tuning a soft prompt with a restricted set of input samples. This approach introduces an imbalanced poisoned dataset, making it susceptible to overfitting and lacking attention awareness. To address these challenges, we introduce TrojFST for backdoor attacks within the framework of few-shot prompt-tuning. TrojFST comprises three modules: balanced poison learning, selective token poisoning, and trojan-trigger attention. In comparison to previous prompt-based backdoor attacks, TrojFST demonstrates significant improvements, enhancing ASR $> 9\%$ and CDA by $> 4\%$ across various PLMs and a diverse set of downstream tasks.

arxiv情報

著者	Mengxin Zheng,Jiaqi Xue,Xun Chen,YanShan Wang,Qian Lou,Lei Jiang
発行日	2024-01-25 15:51:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TrojFST: Embedding Trojans in Few-shot Prompt Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー