Interactive and Concentrated Differential Privacy for Bandits

要約

バンディットは、インタラクティブな学習スキームや最新のレコメンダーシステムにおいて重要な役割を果たしている。しかしながら、これらのシステムはしばしば機密性の高いユーザーデータに依存しており、プライバシーが重要な関心事となっている。本論文では、対話的差分プライバシー(Differential Privacy: DP)のレンズを通して、信頼できる中央集権的意思決定者を持つバンディットにおけるプライバシーを研究する。純粋な$epsilon$-global DPの下でのバンディットはよく研究されているが、我々はzero Concentrated DP (zCDP)の下でのバンディットの理解に貢献する。有限武装バンディットと線形バンディットに対する後悔の最小公倍数と問題依存の下界を提供し、これらの設定における$epsilon$-global zCDPのコストを定量化する。これらの下界は、プライバシー予算$rho$に基づく2つの硬さ領域を明らかにし、$rho$-グローバルzCDPが純粋な$epsilon$-グローバルDPより後悔が少ないことを示唆する。AdaC-UCBとAdaC-GOPEという2つの$rho$-global zCDPバンディット・アルゴリズムを、それぞれ有限アームバンディットと線形バンディットに対して提案する。両アルゴリズムとも、ガウスメカニズムと適応エピソードという共通のレシピを用いる。これらのアルゴリズムの後悔を解析し、AdaC-UCBが乗法定数までの問題依存後悔下限を達成し、AdaC-GOPEが多対数までの最小後悔下限を達成することを示す。最後に、我々の理論的な結果を、様々な設定の下で実験的に検証する。

要約(オリジナル)

Bandits play a crucial role in interactive learning schemes and modern recommender systems. However, these systems often rely on sensitive user data, making privacy a critical concern. This paper investigates privacy in bandits with a trusted centralized decision-maker through the lens of interactive Differential Privacy (DP). While bandits under pure $\epsilon$-global DP have been well-studied, we contribute to the understanding of bandits under zero Concentrated DP (zCDP). We provide minimax and problem-dependent lower bounds on regret for finite-armed and linear bandits, which quantify the cost of $\rho$-global zCDP in these settings. These lower bounds reveal two hardness regimes based on the privacy budget $\rho$ and suggest that $\rho$-global zCDP incurs less regret than pure $\epsilon$-global DP. We propose two $\rho$-global zCDP bandit algorithms, AdaC-UCB and AdaC-GOPE, for finite-armed and linear bandits respectively. Both algorithms use a common recipe of Gaussian mechanism and adaptive episodes. We analyze the regret of these algorithms to show that AdaC-UCB achieves the problem-dependent regret lower bound up to multiplicative constants, while AdaC-GOPE achieves the minimax regret lower bound up to poly-logarithmic factors. Finally, we provide experimental validation of our theoretical results under different settings.

arxiv情報

著者	Achraf Azize,Debabrota Basu
発行日	2023-09-01 16:08:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Interactive and Concentrated Differential Privacy for Bandits

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー