Leading the Pack: N-player Opponent Shaping

要約

強化学習ソリューションは、2 プレーヤーの一般的な合計設定で大きな成功を収めています。
この設定では、エージェントが協力プレイヤーの学習を考慮するという対戦相手シェイピング (OS) のパラダイムにより、エージェントが全体的に悪い結果を回避しながら報酬を最大化することができます。
これらの方法は現在 2 プレイヤーゲームに限定されています。
しかし、現実の世界では、さらに多くのエージェントとの対話が必要であり、ローカルスケールとグローバルスケールの両方で対話が行われます。
この論文では、相手シェーピング (OS) 手法を、複数の協力プレイヤーと複数のシェーピングエージェントが関与する環境に拡張します。
プレイヤーの数を 3 人から 5 人まで変化させて 4 つ以上の異なる環境で評価し、モデルベースの OS 手法が単純な学習よりも優れたグローバルウェルフェアを備えた平衡状態に収束することを実証しました。
ただし、多数の共同プレイヤーとプレイする場合、OS メソッドの相対的なパフォーマンスが低下することがわかり、制限内では OS メソッドが適切にパフォーマンスを発揮できない可能性があることが示唆されています。
最後に、複数の OS メソッドが存在するシナリオを調査し、大多数の協力エージェントを必要とするゲーム内では、OS メソッドがグローバルウェルフェアの低い結果に収束することに注目します。

要約(オリジナル)

Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world involves interactions with many more agents, with interactions on both local and global scales. In this paper, we extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We evaluate on over 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning. However, we find that when playing with a large number of co-players, OS methods’ relative performance reduces, suggesting that in the limit OS methods may not perform well. Finally, we explore scenarios where more than one OS method is present, noticing that within games requiring a majority of cooperating agents, OS methods converge to outcomes with poor global welfare.

arxiv情報

著者	Alexandra Souly,Timon Willi,Akbir Khan,Robert Kirk,Chris Lu,Edward Grefenstette,Tim Rocktäschel
発行日	2023-12-26 11:23:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Leading the Pack: N-player Opponent Shaping

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー