Off-Policy Correction For Multi-Agent Reinforcement Learning

要約

マルチエージェント強化学習（MARL）は、相互作用する複数のエージェントを含む問題のためのフレームワークを提供する。シングルエージェントの場合と一見似ているにもかかわらず、マルチエージェント問題はしばしば理論的な学習や分析が困難である。本研究では、V-TraceをMARL設定に拡張した、新しいon-policyアクター批判アルゴリズムであるMA-Traceを提案する。我々のアルゴリズムの主な利点は、マルチワーカー設定における高いスケーラビリティである。この目的のため、MA-Traceはオフポリシー補正法として重要度サンプリングを利用し、訓練の質に影響を与えることなく計算を分散させることができる。さらに、我々のアルゴリズムは理論的根拠があり、収束を保証する固定点定理を証明する。我々は、マルチエージェントアルゴリズムの標準的なベンチマークであるStarCraft Multi-Agent Challengeにおいて、アルゴリズムを広範囲に評価した。MA-Traceは全てのタスクで高い性能を達成し、いくつかのタスクでは最先端の結果を上回った。

要約(オリジナル)

Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training. Furthermore, our algorithm is theoretically grounded – we prove a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.

arxiv情報

著者	Michał Zawalski,Błażej Osiński,Henryk Michalewski,Piotr Miłoś
発行日	2024-04-03 17:13:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Off-Policy Correction For Multi-Agent Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー