Unsynchronized Decentralized Q-Learning: Two Timescale Analysis By Persistence

要約

非定常性は、マルチエージェント強化学習（MARL）の基本的な課題であり、エージェントが学習中に行動を更新します。
MARLの多くの理論的進歩は、エージェントがポリシーを修正することが許可されている同期時間を含む、さまざまな方法でエージェントのポリシー更新を調整することにより、非定常性の課題を回避します。
同期により、マルチタイムスケールメソッドを介した多くのMARLアルゴリズムの分析が可能になりますが、多くの分散型アプリケーションではそのような同期は実行不可能です。
この論文では、確率的ゲームの最近のMARLアルゴリズムである分散型Qラーニングアルゴリズムの非同期化されていないバリアントを研究します。
非物語化されていないアルゴリズムが再生を駆動する十分な条件を提供します。
当社のソリューションは、Qファクターアップデートで一定の学習率を利用しています。これは、以前の作業の同期仮定を緩和するために重要であることを示しています。
私たちの分析は、後悔のテストの伝統からの他の多くのアルゴリズムの非物語化されていない一般化にも適用されます。そのパフォーマンスは、ポリシー更新ダイナミクスを介して得られたマルコフチェーンを調べるマルチタイムスケール方法によって分析されます。
この作業は、分散型Q学習アルゴリズムとその親sの適用性を、パラメーターが独立した方法で選択される設定に拡張し、以前の作業の調整仮定を課すことなく非定常性を飼いならします。

要約(オリジナル)

Non-stationarity is a fundamental challenge in multi-agent reinforcement learning (MARL), where agents update their behaviour as they learn. Many theoretical advances in MARL avoid the challenge of non-stationarity by coordinating the policy updates of agents in various ways, including synchronizing times at which agents are allowed to revise their policies. Synchronization enables analysis of many MARL algorithms via multi-timescale methods, but such synchronization is infeasible in many decentralized applications. In this paper, we study an unsynchronized variant of the decentralized Q-learning algorithm, a recent MARL algorithm for stochastic games. We provide sufficient conditions under which the unsynchronized algorithm drives play to equilibrium with high probability. Our solution utilizes constant learning rates in the Q-factor update, which we show to be critical for relaxing the synchronization assumptions of earlier work. Our analysis also applies to unsynchronized generalizations of a number of other algorithms from the regret testing tradition, whose performance is analyzed by multi-timescale methods that study Markov chains obtained via policy update dynamics. This work extends the applicability of the decentralized Q-learning algorithm and its relatives to settings in which parameters are selected in an independent manner, and tames non-stationarity without imposing the coordination assumptions of prior work.

arxiv情報

著者	Bora Yongacoglu,Gürdal Arslan,Serdar Yüksel
発行日	2025-03-18 16:30:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Unsynchronized Decentralized Q-Learning: Two Timescale Analysis By Persistence

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー