Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data

要約

合成データは、プライバシーを保護しながら個人レベルのデータを共有するための最も有望なソリューションの 1 つとして浮上しています。
シャドウモデリングに基づくメンバーシップ推論攻撃 (MIA) は、合成データのプライバシーを評価するための標準となっていますが、現時点では、攻撃者がトレーニングデータセットと同様の分布からサンプリングされた補助データセットにアクセスできると想定しています。
これは、実際には、特に合成表形式データ (医療データ、金融取引など) に対して提案されている主な使用例が非常に特殊であり、直接利用できる参照データセットがないため、非常に強力な前提であるとみなされることがよくあります。
ここでは、この仮定を削除して、合成データのみを使用して MIA を実行できるようにする方法を示します。
具体的には、3 つの異なるシナリオを開発しました。(S1) ジェネレーターへのブラックボックスアクセス、(S2) 公開された合成データセットへのアクセスのみ、および (S3) 合成データのみを使用した攻撃パフォーマンスの上限としての理論的設定です。
私たちの結果は、2 つの現実世界のデータセットと 2 つの合成データジェネレーターにわたって MIA が依然として成功していることを示しています。
これらの結果は、合成データのリリースを監査するときに立てられた強力な仮説 (補助データセットへのアクセス) がどのように緩和され、実際の攻撃がより現実的になるかを示しています。

要約(オリジナル)

Synthetic data is emerging as one of the most promising solutions to share individual-level data while safeguarding privacy. While membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data, they currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. This is often seen as a very strong assumption in practice, especially as the proposed main use cases for synthetic tabular data (e.g. medical data, financial transactions) are very specific and don’t have any reference datasets directly available. We here show how this assumption can be removed, allowing for MIAs to be performed using only the synthetic data. Specifically, we developed three different scenarios: (S1) Black-box access to the generator, (S2) only access to the released synthetic dataset and (S3) a theoretical setup as upper bound for the attack performance using only synthetic data. Our results show that MIAs are still successful, across two real-world datasets and two synthetic data generators. These results show how the strong hypothesis made when auditing synthetic data releases – access to an auxiliary dataset – can be relaxed, making the attacks more realistic in practice.

arxiv情報

著者	Florent Guépin,Matthieu Meeus,Ana-Maria Cretu,Yves-Alexandre de Montjoye
発行日	2023-09-21 12:06:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー