Privacy Side Channels in Machine Learning Systems

要約

機械学習 (ML) におけるプライバシーを保護するための現在のアプローチのほとんどは、モデルが真空の中に存在することを前提としていますが、実際には ML モデルは、トレーニングデータのフィルタリングや出力モニタリングなどのコンポーネントを含む大規模なシステムの一部です。
この研究では、プライバシーサイドチャネルを導入します。これは、これらのシステムレベルのコンポーネントを悪用して、スタンドアロンモデルで可能であるよりもはるかに高い速度で個人情報を抽出する攻撃です。
私たちは、ML ライフサイクル全体 (トレーニングデータフィルタリング、入力前処理、出力後処理、クエリフィルタリング) にわたる 4 つのカテゴリのサイドチャネルを提案し、強化されたメンバーシップ推論攻撃や、ユーザーのテストクエリの抽出などの新しい脅威さえも可能にします。
たとえば、差分プライベートトレーニングを適用する前にトレーニングデータの重複を排除すると、証明可能なプライバシー保証を完全に無効にするサイドチャネルが作成されることを示します。
さらに、言語モデルがトレーニングデータを再生成するのをブロックするシステムを悪用すると、たとえモデルが秘密キーを記憶していなかったとしても、トレーニングセットに含まれる秘密キーを正確に再構築できることを示します。
総合すると、私たちの結果は、機械学習の総合的なエンドツーエンドのプライバシー分析の必要性を示しています。

要約(オリジナル)

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models. We propose four categories of side channels that span the entire ML lifecycle (training data filtering, input preprocessing, output post-processing, and query filtering) and allow for either enhanced membership inference attacks or even novel threats such as extracting users’ test queries. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. Moreover, we show that systems which block language models from regenerating training data can be exploited to allow exact reconstruction of private keys contained in the training set — even if the model did not memorize these keys. Taken together, our results demonstrate the need for a holistic, end-to-end privacy analysis of machine learning.

arxiv情報

著者	Edoardo Debenedetti,Giorgio Severi,Nicholas Carlini,Christopher A. Choquette-Choo,Matthew Jagielski,Milad Nasr,Eric Wallace,Florian Tramèr
発行日	2023-09-11 16:49:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Privacy Side Channels in Machine Learning Systems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー