Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

要約

タイトル：言語モデルの脆弱性を調査するためのトリガーとしてのプロンプトを用いたバックドア攻撃の検討

要約：
– プロンプトに基づく学習パラダイムは、プレトレーニングとファインチューニングのギャップを埋め、特にフューショットセッティングでいくつかのNLPタスクで最先端の性能を実現しています。
– しかし、プロンプトに基づく学習はバックドア攻撃に対して脆弱であることがわかりました。バックドア攻撃は、トリガーの注入とラベルの変更を通じて、一部のトレーニングサンプルを汚染することで、モデルにターゲットの脆弱性を導入するように設計されています。
– 本研究では、トリガーとしてプロンプトを使用する革新的かつ効率的なクリーンラベルバックドア攻撃の方法であるProAttackを提案しています。
– ProAttackは、外部トリガーを必要とせず、汚染されたサンプルの正しいラベリングを保証し、バックドア攻撃の潜在性を改善します。
– ProAttackの豊富なリソースとフューショットテキスト分類タスクに対する実験により、ProAttackのテキストバックドア攻撃での競争力のある性能が実証されました。
– 特に、リッチリソース設定では、外部トリガーを必要としないクリーンラベルバックドア攻撃ベンチマークで、ProAttackは最先端の攻撃成功率を達成しました。

要約(オリジナル)

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose ProAttack, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack’s competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers.

arxiv情報

著者	Shuai Zhao,Jinming Wen,Luu Anh Tuan,Junbo Zhao,Jie Fu
発行日	2023-05-03 00:42:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー