Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

要約

メンバシップ推論攻撃(MIA)は、与えられたデータサンプルがモデルの学習セットに含まれるかどうかを検証しようとするものである。近年、大規模言語モデル（LLM）の急速な発展に伴い、MIAが重要視されている。LLMのトレーニングに著作物が使用されることを懸念し、そのような使用を検出する方法を求める声も多い。しかし、最近の研究では、現在のMIA手法はLLMでは機能しないというのが大方の結論である。機能しているように見える場合でも、それはたいてい、他のショートカット機能が「不正行為」を可能にするような、不適切に設計された実験設定のためである。本研究では、MIAはLLMでも機能するが、複数の文書がテスト用に提示された場合にのみ機能することを主張する。我々は、文（n-gram）から文書集合（トークンの複数のチャンク）まで、連続したデータサンプルのスケールでMIAの性能を測定する新しいベンチマークを構築する。現在のMIAアプローチの有効性をより大きなスケールで検証するために、段落レベルのMIA特徴を集約し、文書や文書集合レベルでのMIAを可能にする、バイナリメンバーシップ検出タスクのためのデータセット推論（DI）に関する最近の研究を適応させる。このベースラインは、事前に訓練され、微調整されたLLM上で初めてMIAに成功した。

要約(オリジナル)

Membership inference attacks (MIA) attempt to verify the membership of a given data sample in the training set for a model. MIA has become relevant in recent years, following the rapid development of large language models (LLM). Many are concerned about the usage of copyrighted materials for training them and call for methods for detecting such usage. However, recent research has largely concluded that current MIA methods do not work on LLMs. Even when they seem to work, it is usually because of the ill-designed experimental setup where other shortcut features enable ‘cheating.’ In this work, we argue that MIA still works on LLMs, but only when multiple documents are presented for testing. We construct new benchmarks that measure the MIA performances at a continuous scale of data samples, from sentences (n-grams) to a collection of documents (multiple chunks of tokens). To validate the efficacy of current MIA approaches at greater scales, we adapt a recent work on Dataset Inference (DI) for the task of binary membership detection that aggregates paragraph-level MIA features to enable MIA at document and collection of documents level. This baseline achieves the first successful MIA on pre-trained and fine-tuned LLMs.

arxiv情報

著者	Haritz Puerto,Martin Gubri,Sangdoo Yun,Seong Joon Oh
発行日	2025-02-03 15:33:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー