FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game

要約

現実世界のアプリケーションの多くには、2 つのチームに分類されるエージェントが含まれており、その利得は同じチーム内では等しいが、相手チームでは反対の符号になります。
いわゆる2チームゼロサムマルコフゲーム（2t0sMGs）は、近年強化学習で解決できるようになりました。
しかし、既存の方法は、チーム内のクレジットの割り当て、データの利用、および計算の難しさへの考慮が不十分であるため、非効率的です。
この論文では、2t0sMG の Q 関数を通じて 2 チームのミニマックス動作と個々の貪欲な動作の間の一貫性を確保するための個別グローバルミニマックス (IGMM) 原理を提案します。
これに基づいて、新しいマルチエージェント強化学習フレームワークである Factorized Multi-Agent MiniMax Q-Learning (FM3Q) を提案します。これは、結合ミニマックス Q 関数を個々の関数に因数分解し、IGMM が満たすミニマックス Q 関数を反復的に解くことができます。
2t0sMG。
さらに、ニューラルネットワークを使用したオンライン学習アルゴリズムがFM3Qを実装し、2チームプレーヤーのための決定論的で分散型のミニマックスポリシーを取得するために提案されています。
FM3Q の収束性を証明するために理論的分析が提供されます。
経験的に、私たちは 3 つの環境を使用して FM3Q の学習効率と最終パフォーマンスを評価し、2t0sMG に対するその優位性を示しました。

要約(オリジナル)

Many real-world applications involve some agents that fall into two teams, with payoffs that are equal within the same team but of opposite sign across the opponent team. The so-called two-team zero-sum Markov games (2t0sMGs) can be resolved with reinforcement learning in recent years. However, existing methods are thus inefficient in light of insufficient consideration of intra-team credit assignment, data utilization and computational intractability. In this paper, we propose the individual-global-minimax (IGMM) principle to ensure the coherence between two-team minimax behaviors and the individual greedy behaviors through Q functions in 2t0sMGs. Based on it, we present a novel multi-agent reinforcement learning framework, Factorized Multi-Agent MiniMax Q-Learning (FM3Q), which can factorize the joint minimax Q function into individual ones and iteratively solve for the IGMM-satisfied minimax Q functions for 2t0sMGs. Moreover, an online learning algorithm with neural networks is proposed to implement FM3Q and obtain the deterministic and decentralized minimax policies for two-team players. A theoretical analysis is provided to prove the convergence of FM3Q. Empirically, we use three environments to evaluate the learning efficiency and final performance of FM3Q and show its superiority on 2t0sMGs.

arxiv情報

著者	Guangzheng Hu,Yuanheng Zhu,Haoran Li,Dongbin Zhao
発行日	2024-02-01 16:37:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー