It’s My Data Too: Private ML for Datasets with Multi-User Training Examples

要約

ユーザーレベルの差動プライバシー（DP）を使用したモデルトレーニングのアルゴリズムの研究を開始します。各例は、複数のユーザーに起因する可能性があります。
まず、マルチアトリビューションモデルでユーザーレベルのDPの慎重に選択された定義を提供します。
マルチアトリビューションモデルのトレーニングは、貢献境界問題、つまり、各ユーザーが限られた数の例に関連付けられているデータセットのサブセットを選択する問題を解決することにより促進されます。
貢献境界問題のための貪欲なベースラインアルゴリズムを提案します。
次に、このアルゴリズムを合成ロジスティック回帰タスクと、さまざまな手法と基準を使用して選択したサブセットを最適化するこのベースラインアルゴリズムのバリエーションの研究を含む、変圧器トレーニングタスクについて経験的に研究します。
ベースラインアルゴリズムは、ほとんどの設定でそのバリアントと競合し続け、貢献境界問題の解決策に固有のバイアス分散トレードオフの実際的な重要性をよりよく理解することができます。

要約(オリジナル)

We initiate a study of algorithms for model training with user-level differential privacy (DP), where each example may be attributed to multiple users, which we call the multi-attribution model. We first provide a carefully chosen definition of user-level DP under the multi-attribution model. Training in the multi-attribution model is facilitated by solving the contribution bounding problem, i.e. the problem of selecting a subset of the dataset for which each user is associated with a limited number of examples. We propose a greedy baseline algorithm for the contribution bounding problem. We then empirically study this algorithm for a synthetic logistic regression task and a transformer training task, including studying variants of this baseline algorithm that optimize the subset chosen using different techniques and criteria. We find that the baseline algorithm remains competitive with its variants in most settings, and build a better understanding of the practical importance of a bias-variance tradeoff inherent in solutions to the contribution bounding problem.

arxiv情報

著者	Arun Ganesh,Ryan McKenna,Brendan McMahan,Adam Smith,Fan Wu
発行日	2025-03-05 16:02:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

It’s My Data Too: Private ML for Datasets with Multi-User Training Examples

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー