Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach

要約

協調的なマルチエージェントの問題では、多くの場合エージェント間の調整が必要ですが、これはグローバル状態を考慮した集中ポリシーを通じて実現できます。
このようなポリシーを学習するには、マルチエージェントポリシーグラディエント (MAPG) 手法が一般的に使用されますが、多くの場合、低レベルのアクションスペースの問題に限定されます。
大規模な状態空間とアクション空間を伴う複雑な問題では、MAPG メソッドを拡張して、オプションとも呼ばれる高レベルのアクションを使用してポリシー検索の効率を向上させると有利です。
ただし、マルチロボットオプションの実行は非同期であることが多く、エージェントは異なるタイムステップでオプションを選択して完了する可能性があります。
集中型ポリシーは常に新しいオプションを同時に選択するため、MAPG メソッドでは集中型ポリシーを導出し、その勾配を評価することが困難になります。
この研究では、この問題に対処するための新しい条件付き推論アプローチを提案し、経験的検証を通じて代表的なオプションベースのマルチエージェント協力タスクに対するその有効性を実証します。
コードとビデオは \href{https://sites.google.com/view/mahrlsupp/}{https://sites.google.com/view/mahrlsupp/} で検索できます。

要約(オリジナル)

Cooperative multi-agent problems often require coordination between agents, which can be achieved through a centralized policy that considers the global state. Multi-agent policy gradient (MAPG) methods are commonly used to learn such policies, but they are often limited to problems with low-level action spaces. In complex problems with large state and action spaces, it is advantageous to extend MAPG methods to use higher-level actions, also known as options, to improve the policy search efficiency. However, multi-robot option executions are often asynchronous, that is, agents may select and complete their options at different time steps. This makes it difficult for MAPG methods to derive a centralized policy and evaluate its gradient, as centralized policy always select new options at the same time. In this work, we propose a novel, conditional reasoning approach to address this problem and demonstrate its effectiveness on representative option-based multi-agent cooperative tasks through empirical validation. Find code and videos at: \href{https://sites.google.com/view/mahrlsupp/}{https://sites.google.com/view/mahrlsupp/}

arxiv情報

著者	Xubo Lyu,Amin Banitalebi-Dehkordi,Mo Chen,Yong Zhang
発行日	2023-08-02 05:57:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー