Client Selection for Federated Policy Optimization with Environment Heterogeneity


ポリシー反復 (PI) の開発は、いくつかのポリシー勾配法を含む強化学習 (RL) の最近の多くのアルゴリズムに影響を与え、さまざまなタスクで理論的な健全性と経験的な成功の両方を獲得しました。
PI の理論は集中学習の文脈において豊富ですが、その研究は連合環境下ではまだ初期段階にあります。
このペーパーでは、近似 PI (API) のフェデレーション バージョンを調査し、環境の異質性によって導入される近似誤差を考慮して、その誤差限界を導き出します。


The development of Policy Iteration (PI) has inspired many recent algorithms for Reinforcement Learning (RL), including several policy gradient methods, that gained both theoretical soundness and empirical success on a variety of tasks. The theory of PI is rich in the context of centralized learning, but its study is still in the infant stage under the federated setting. This paper explores the federated version of Approximate PI (API) and derives its error bound, taking into account the approximation error introduced by environment heterogeneity. We theoretically prove that a proper client selection scheme can reduce this error bound. Based on the theoretical result, we propose a client selection algorithm to alleviate the additional approximation error caused by environment heterogeneity. Experiment results show that the proposed algorithm outperforms other biased and unbiased client selection methods on the federated mountain car problem by effectively selecting clients with a lower level of heterogeneity from the population distribution.


著者 Zhijie Xie,S. H. Song
発行日 2023-05-18 13:48:20+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.LG パーマリンク