No-regret Exploration in Shuffle Private Reinforcement Learning

要約

差分プライバシー (DP) は最近、パーソナライズされたサービスにおけるユーザーのプライバシーの問題に正式に対処するために、エピソード強化学習 (RL) に導入されました。
これまでの研究では、主に DP の 2 つの信頼モデルに焦点を当てていました。1 つは中央エージェントがユーザーの機密データの保護を担当する中央モデル、もう 1 つはユーザー側で保護が直接行われる (より強力な) ローカルモデルです。
ただし、信頼できる中央エージェントが必要か、プライバシーコストが大幅に高くなるため、多くのシナリオには適していません。
この取り組みでは、中央モデルよりも強力でありながら、ローカルモデルよりもプライバシーコストが低い信頼モデルを導入し、新しい \emph{shuffle} プライバシーモデルを活用しています。
我々は、シャッフルモデルの下でエピソード RL のための最初の汎用アルゴリズムを提示します。このアルゴリズムでは、信頼できるシャッラーがユーザーのデータのバッチをランダムに並べ替えてから中央エージェントに送信します。
次に、シャッフルプライベートバイナリ合計メカニズムに依存して、提案したシャッフル Privatizer を使用してアルゴリズムをインスタンス化します。
私たちの分析によると、このアルゴリズムは集中型モデルと同等の最適に近いリグレス限界を達成し、プライバシーコストの点でローカルモデルを大幅に上回っています。

要約(オリジナル)

Differential privacy (DP) has recently been introduced into episodic reinforcement learning (RL) to formally address user privacy concerns in personalized services. Previous work mainly focuses on two trust models of DP: the central model, where a central agent is responsible for protecting users’ sensitive data, and the (stronger) local model, where the protection occurs directly on the user side. However, they either require a trusted central agent or incur a significantly higher privacy cost, making it unsuitable for many scenarios. This work introduces a trust model stronger than the central model but with a lower privacy cost than the local model, leveraging the emerging \emph{shuffle} model of privacy. We present the first generic algorithm for episodic RL under the shuffle model, where a trusted shuffler randomly permutes a batch of users’ data before sending it to the central agent. We then instantiate the algorithm using our proposed shuffle Privatizer, relying on a shuffle private binary summation mechanism. Our analysis shows that the algorithm achieves a near-optimal regret bound comparable to that of the centralized model and significantly outperforms the local model in terms of privacy cost.

arxiv情報

著者	Shaojie Bai,Mohammad Sadegh Talebi,Chengcheng Zhao,Peng Cheng,Jiming Chen
発行日	2024-11-18 15:24:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

No-regret Exploration in Shuffle Private Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー