To Trust or Not To Trust Prediction Scores for Membership Inference Attacks

要約

メンバーシップ推論攻撃 (MIA) は、特定のサンプルが予測モデルのトレーニングに使用されたかどうかを判断することを目的としています。
これを知っていると、実際にプライバシー侵害につながる可能性があります.
ただし、ほとんどの MIA は、モデルの予測スコア (何らかの入力が与えられた場合の各出力の確率) を使用します。これは、トレーニングされたモデルがトレーニングデータに対して異なる動作をする傾向があるという直感に従っています。
これは、多くの最新のディープネットワークアーキテクチャの誤謬であると私たちは主張します。
その結果、自信過剰は既知のドメインだけでなく、配布されていないデータでも高い偽陽性率につながり、暗黙のうちに MIA に対する防御として機能するため、MIA は惨めに失敗します。
具体的には、敵対的生成ネットワークを使用することで、トレーニングデータの一部として誤って分類された潜在的に無限の数のサンプルを生成することができます。
言い換えれば、MIA の脅威は過大評価されており、以前に想定されていたよりも少ない情報が漏洩しています。
さらに、実際には、モデルの過信と MIA への脆弱性との間にはトレードオフがあります。分類子が知らないときに知っているほど、信頼度の低い予測を行うほど、トレーニングデータが明らかになります。

要約(オリジナル)

Membership inference attacks (MIAs) aim to determine whether a specific sample was used to train a predictive model. Knowing this may indeed lead to a privacy breach. Most MIAs, however, make use of the model’s prediction scores – the probability of each output given some input – following the intuition that the trained model tends to behave differently on its training data. We argue that this is a fallacy for many modern deep network architectures. Consequently, MIAs will miserably fail since overconfidence leads to high false-positive rates not only on known domains but also on out-of-distribution data and implicitly acts as a defense against MIAs. Specifically, using generative adversarial networks, we are able to produce a potentially infinite number of samples falsely classified as part of the training data. In other words, the threat of MIAs is overestimated, and less information is leaked than previously assumed. Moreover, there is actually a trade-off between the overconfidence of models and their susceptibility to MIAs: the more classifiers know when they do not know, making low confidence predictions, the more they reveal the training data.

arxiv情報

著者	Dominik Hintersdorf,Lukas Struppek,Kristian Kersting
発行日	2023-01-24 14:56:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

To Trust or Not To Trust Prediction Scores for Membership Inference Attacks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー