Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning

要約

最近の研究では、セルフプレイを使用したオープンエンド型タスクの配布でメタ強化学習を使用して訓練されたエージェントには、複雑な協力行動が現れる可能性があることが証明されています。
結果は印象的ですが、セルフプレイやその他の集中型トレーニング技術は、分散型トレーニングや無制限のタスク配分を通じて、自然界で一般的な集団探索戦略がどのように出現するかを正確に反映していないと私たちは主張します。
したがって、この研究では、複数のエージェントがタスクの無制限の分散に関する独立した反復ポリシーをメタ学習する、集団探索戦略の出現を調査します。
この目的を達成するために、5 つの多様なタスクタイプからサンプリングされた複数のサブタスクを動的に組み合わせてタスクツリーの広大な分布を形成する、オープンエンドの手続き的に生成されたタスクスペースを備えた新しい環境を導入します。
私たちの環境で訓練された分散エージェントは、テスト時に新しいオブジェクトに直面したときに強力な一般化能力を示すことを示します。
さらに、トレーニング中に決して協力を強制されなかったにもかかわらず、エージェントは集団探索戦略を学習し、トレーニング中に遭遇したことのない新しいタスクを解決できるようになります。
さらに、エージェントが学習した集団探索戦略がオープンエンドのタスク設定にまで拡張され、トレーニング中に見られたものと比較して 2 倍の深さのタスクツリーを解決できることがわかりました。
当社のオープンソースコードとエージェントのビデオは、当社の関連 Web サイトでご覧いただけます。

要約(オリジナル)

Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered during training. We further find that the agents learned collective exploration strategies extend to an open ended task setting, allowing them to solve task trees of twice the depth compared to the ones seen during training. Our open source code as well as videos of the agents can be found on our companion website.

arxiv情報

著者	Richard Bornemann,Gautier Hamon,Eleni Nisioti,Clément Moulin-Frier
発行日	2023-11-02 10:35:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー