Differential Privacy Under Class Imbalance: Methods and Empirical Insights

要約

不均衡な学習は、希少疾患の予測や不正検出など、トレーニングデータ内でクラスラベルの分布が大きく偏っている分類設定で発生します。
このクラスの不均衡はアルゴリズム上の重大な課題を引き起こし、機密トレーニングデータを保護するために差分プライバシーなどのプライバシー保護技術が適用される場合、この問題はさらに悪化する可能性があります。
私たちの研究はこれらの課題を形式化し、多数のアルゴリズムによる解決策を提供します。
クラスの不均衡を軽減するために元のデータセットを非公開で拡張する前処理メソッドの DP バリアントを検討します。
これには、オーバーサンプリング、SMOTE、プライベート合成データ生成が含まれます。
また、不均衡を考慮して学習アルゴリズムを調整する、処理中手法の DP バリアントも考慮します。
これらには、モデルのバギング、クラス重み付けの経験的リスク最小化、クラス重み付けの深層学習が含まれます。
それぞれの方法について、既存の不均衡な学習手法をプライベート設定に適応させるか、差分プライバシーとの非互換性を実証します。
最後に、さまざまなデータと分布設定の下で、プライバシーを保護する不均衡な学習方法を経験的に評価します。
プライベート合成データ手法はデータの前処理ステップとしては良好に機能しますが、クラス重み付け ERM はプライベート合成データが次元の呪いに悩まされる高次元の設定では代替手段であることがわかりました。

要約(オリジナル)

Imbalanced learning occurs in classification settings where the distribution of class-labels is highly skewed in the training data, such as when predicting rare diseases or in fraud detection. This class imbalance presents a significant algorithmic challenge, which can be further exacerbated when privacy-preserving techniques such as differential privacy are applied to protect sensitive training data. Our work formalizes these challenges and provides a number of algorithmic solutions. We consider DP variants of pre-processing methods that privately augment the original dataset to reduce the class imbalance; these include oversampling, SMOTE, and private synthetic data generation. We also consider DP variants of in-processing techniques, which adjust the learning algorithm to account for the imbalance; these include model bagging, class-weighted empirical risk minimization and class-weighted deep learning. For each method, we either adapt an existing imbalanced learning technique to the private setting or demonstrate its incompatibility with differential privacy. Finally, we empirically evaluate these privacy-preserving imbalanced learning methods under various data and distributional settings. We find that private synthetic data methods perform well as a data pre-processing step, while class-weighted ERMs are an alternative in higher-dimensional settings where private synthetic data suffers from the curse of dimensionality.

arxiv情報

著者	Lucas Rosenblatt,Yuliia Lut,Eitan Turok,Marco Avella-Medina,Rachel Cummings
発行日	2024-11-08 17:46:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Differential Privacy Under Class Imbalance: Methods and Empirical Insights

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー