Measuring memorization in language models via probabilistic extraction

要約

大規模な言語モデル（LLM）は、トレーニングデータを記憶しやすく、世代の機密情報の潜在的な抽出に関する懸念を高めています。
発見可能な抽出は、この問題を測定するための最も一般的な方法です。トレーニングの例を接頭辞と接尾辞に分割し、プレフィックスでLLMを促し、LLMが貪欲なサンプリングを使用して一致する接尾辞を生成する場合に抽出可能な例を見なします。
この定義により、単一のクエリに関して抽出が成功したかどうかのYES-またはNOの決定が得られます。
計算するのは効率的ですが、この定義は、LLMが同じプロンプトの範囲の出力を生成するより現実的な（ゼロ以外の）サンプリングスキームに存在する非決定的主義を考慮していないため、信頼できないことを示します。
確率的発見可能な抽出を導入します。これは、追加のコストなしで、ターゲットシーケンスを抽出する確率を定量化するために複数のクエリを検討することにより、発見可能な抽出を緩和します。
さまざまなモデル、サンプリングスキーム、およびトレーニングデータの繰り返しにわたる確率論的尺度を評価し、この測定値が、従来の発見可能な抽出と比較して抽出リスクに関するより微妙な情報を提供することを発見します。

要約(オリジナル)

Large language models (LLMs) are susceptible to memorizing training data, raising concerns about the potential extraction of sensitive information at generation time. Discoverable extraction is the most common method for measuring this issue: split a training example into a prefix and suffix, then prompt the LLM with the prefix, and deem the example extractable if the LLM generates the matching suffix using greedy sampling. This definition yields a yes-or-no determination of whether extraction was successful with respect to a single query. Though efficient to compute, we show that this definition is unreliable because it does not account for non-determinism present in more realistic (non-greedy) sampling schemes, for which LLMs produce a range of outputs for the same prompt. We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries to quantify the probability of extracting a target sequence. We evaluate our probabilistic measure across different models, sampling schemes, and training-data repetitions, and find that this measure provides more nuanced information about extraction risk compared to traditional discoverable extraction.

arxiv情報

著者	Jamie Hayes,Marika Swanberg,Harsh Chaudhari,Itay Yona,Ilia Shumailov,Milad Nasr,Christopher A. Choquette-Choo,Katherine Lee,A. Feder Cooper
発行日	2025-03-12 14:25:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Measuring memorization in language models via probabilistic extraction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー