Do Membership Inference Attacks Work on Large Language Models?

要約

メンバーシップ推論攻撃 (MIA) は、特定のデータポイントがターゲットモデルのトレーニングデータのメンバーであるかどうかを予測しようとします。
従来の機械学習モデルに関する広範な研究にもかかわらず、大規模言語モデル (LLM) の事前トレーニングデータに関する MIA を研究する研究は限られています。
私たちは、Pile 上でトレーニングされた一連の言語モデル (LM) に対して、160M から 12B のパラメーターにわたる MIA の大規模な評価を実行します。
MIA は、さまざまな LLM サイズとドメインにわたるほとんどの設定で、ランダムな推測をほとんど上回るパフォーマンスを示していることがわかりました。
さらなる分析により、このパフォーマンスの低下は、(1) 大規模なデータセットと少数のトレーニング反復の組み合わせ、および (2) メンバーと非メンバー間の本質的に曖昧な境界に起因する可能性があることが明らかになりました。
私たちは、LLM がメンバーシップ推論に対して脆弱であることが示されている特定の設定を特定し、そのような設定での明らかな成功は、メンバーと非メンバーが一見同一のドメインから抽出されるが異なるドメインから抽出される場合など、分布の変化に起因する可能性があることを示します。
時間的な範囲。
私たちはコードとデータを、既存のすべての MIA を含む統合ベンチマークパッケージとしてリリースし、将来の作業をサポートします。

要約(オリジナル)

Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model’s training data. Despite extensive research on traditional machine learning models, there has been limited work studying MIA on the pre-training data of large language models (LLMs). We perform a large-scale evaluation of MIAs over a suite of language models (LMs) trained on the Pile, ranging from 160M to 12B parameters. We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains. Our further analyses reveal that this poor performance can be attributed to (1) the combination of a large dataset and few training iterations, and (2) an inherently fuzzy boundary between members and non-members. We identify specific settings where LLMs have been shown to be vulnerable to membership inference and show that the apparent success in such settings can be attributed to a distribution shift, such as when members and non-members are drawn from the seemingly identical domain but with different temporal ranges. We release our code and data as a unified benchmark package that includes all existing MIAs, supporting future work.

arxiv情報

著者	Michael Duan,Anshuman Suri,Niloofar Mireshghallah,Sewon Min,Weijia Shi,Luke Zettlemoyer,Yulia Tsvetkov,Yejin Choi,David Evans,Hannaneh Hajishirzi
発行日	2024-02-12 17:52:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Do Membership Inference Attacks Work on Large Language Models?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー