Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

要約

タイトル：トリガーとしてのプロンプト：言語モデルの脆弱性の検討

要約：
– プロンプトに基づく学習パラダイムは、事前学習とファインチューニングのギャップを埋め、特にフューショット設定においていくつかのNLPタスクで最先端のパフォーマンスを実現する。
– しかし、プロントベースの学習はバックドア攻撃に対して脆弱である。
– テキストベースのバックドア攻撃は、トリガーの注入や標籤の変更を通じて、モデルに対して指定された脆弱性を導入するように設計されている。
– 本研究では、プロンプト自体をトリガーとして使用する、クリーンラベルバックドア攻撃を実行するための新しい効率的な方法「ProAttack」を提案する。
– この方法は、外部のトリガーを必要とせず、汚染されたサンプルの正しいラベリングを保証し、バックドア攻撃のステルス性を向上させる。
– 豊富なリソースとフューショットのテキスト分類タスクに対する包括的な実験により、ProAttackのテキストベースのバックドア攻撃における競争力のあるパフォーマンスを経験的に検証する。
– 特に、リッチリソースの設定で、ProAttackは外部のトリガーを必要とせずに、クリーンラベルバックドア攻撃ベンチマークにおいて最先端の攻撃成功率を達成する。
– 当社のモデルで使用されたデータとコードはすべて、公に利用可能である。

要約(オリジナル)

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose {\bf ProAttack}, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack’s competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers. All data and code used in our models are publically available\footnote{\url{https://github.com/shuaizhao95/Prompt_attack}}.

arxiv情報

著者	Shuai Zhao,Jinming Wen,Luu Anh Tuan,Junbo Zhao,Jie Fu
発行日	2023-05-02 06:19:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー