Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

要約

我々は、マルチエージェント強化学習（MARL）におけるハイブリッド実行を導入します。これは、エージェントがエージェント間の情報共有を利用して、実行時に任意の通信レベルで協調タスクを正常に完了することを目指す新しいパラダイムです。
ハイブリッド実行では、通信レベルは、エージェント間で通信が許可されない設定 (完全分散) から、完全な通信を特徴とする設定 (完全集中) までの範囲になりますが、エージェントは、どの通信レベルで遭遇するか事前に知りません。
実行時間。
設定を形式化するために、ハイブリッド POMDP と名付けた新しいクラスのマルチエージェント部分観察マルコフ決定プロセス (POMDP) を定義します。これは、エージェント間の通信プロセスを明示的にモデル化します。
私たちは、集中型の方法でトレーニングされた自己回帰予測モデルを利用して、実行時に欠落しているエージェントの観察を推定するアプローチである MARO を提供します。
私たちは、MARL における部分的な可観測性の悪影響を強調するために調整された標準シナリオと以前のベンチマークの拡張に基づいて MARO を評価します。
実験結果は、私たちの方法が一貫して関連するベースラインを上回っており、エージェントが共有情報をうまく活用しながら、不完全な通信で動作できることを示しています。

要約(オリジナル)

We introduce hybrid execution in multi-agent reinforcement learning (MARL), a new paradigm in which agents aim to successfully complete cooperative tasks with arbitrary communication levels at execution time by taking advantage of information-sharing among the agents. Under hybrid execution, the communication level can range from a setting in which no communication is allowed between agents (fully decentralized), to a setting featuring full communication (fully centralized), but the agents do not know beforehand which communication level they will encounter at execution time. To formalize our setting, we define a new class of multi-agent partially observable Markov decision processes (POMDPs) that we name hybrid-POMDPs, which explicitly model a communication process between the agents. We contribute MARO, an approach that makes use of an auto-regressive predictive model, trained in a centralized manner, to estimate missing agents’ observations at execution time. We evaluate MARO on standard scenarios and extensions of previous benchmarks tailored to emphasize the negative impact of partial observability in MARL. Experimental results show that our method consistently outperforms relevant baselines, allowing agents to act with faulty communication while successfully exploiting shared information.

arxiv情報

著者	Pedro P. Santos,Diogo S. Carvalho,Miguel Vasco,Alberto Sardinha,Pedro A. Santos,Ana Paiva,Francisco S. Melo
発行日	2023-06-05 17:35:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー