User Inference Attacks on Large Language Models

要約

微調整は、大規模言語モデル (LLM) を特殊なタスクやアプリケーションに合わせて調整するための一般的で効果的な方法です。
このペーパーでは、ユーザーデータに対する LLM の微調整によるプライバシーへの影響を研究します。
この目的のために、ユーザー推論と呼ばれる現実的な脅威モデルを定義します。このモデルでは、攻撃者はユーザーのデータが微調整に使用されたかどうかを推論します。
この脅威モデルに対して、ユーザーからの少数のサンプルセット (おそらくトレーニングに使用されるサンプルとは異なる) と、微調整された LLM へのブラックボックスアクセスのみを必要とする攻撃を実装します。
LLM は、さまざまな微調整されたデータセットにわたってユーザー推論攻撃の影響を受けやすく、場合によってはほぼ完璧な攻撃成功率になることがわかりました。
さらに、どのプロパティがユーザーをユーザー推論に対して脆弱にするかを調査し、外れ値のユーザー (つまり、データ分布が他のユーザーと十分に異なるユーザー) と大量のデータを提供するユーザーが最も攻撃を受けやすいことがわかりました。
最後に、プライバシー攻撃を軽減するためのいくつかのヒューリスティックを検討します。
バッチまたは例ごとの勾配クリッピングや早期停止などのトレーニングアルゴリズムへの介入では、ユーザーの推論を防ぐことができないことがわかりました。
ただし、単一ユーザーからの微調整サンプルの数を制限すると、微調整データの総量が減少するという犠牲を払ってでも、攻撃の有効性が低下する可能性があります。

要約(オリジナル)

Fine-tuning is a common and effective method for tailoring large language models (LLMs) to specialized tasks and applications. In this paper, we study the privacy implications of fine-tuning LLMs on user data. To this end, we define a realistic threat model, called user inference, wherein an attacker infers whether or not a user’s data was used for fine-tuning. We implement attacks for this threat model that require only a small set of samples from a user (possibly different from the samples used for training) and black-box access to the fine-tuned LLM. We find that LLMs are susceptible to user inference attacks across a variety of fine-tuning datasets, at times with near perfect attack success rates. Further, we investigate which properties make users vulnerable to user inference, finding that outlier users (i.e. those with data distributions sufficiently different from other users) and users who contribute large quantities of data are most susceptible to attack. Finally, we explore several heuristics for mitigating privacy attacks. We find that interventions in the training algorithm, such as batch or per-example gradient clipping and early stopping fail to prevent user inference. However, limiting the number of fine-tuning samples from a single user can reduce attack effectiveness, albeit at the cost of reducing the total amount of fine-tuning data.

arxiv情報

著者	Nikhil Kandpal,Krishna Pillutla,Alina Oprea,Peter Kairouz,Christopher A. Choquette-Choo,Zheng Xu
発行日	2023-10-13 17:24:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

User Inference Attacks on Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー