Scalable Multi-agent Skill Discovery based on Kronecker Graphs

要約

カバリングスキル (別名、オプション) ディスカバリは、状態遷移グラフのフィードラーベクトルによって提供される埋め込み空間内の最も遠い状態を接続することで、報酬信号がまばらなシングルエージェントシナリオにおける RL の探索を改善するために開発されました。
マルチエージェントシステムではエージェントの数に応じて関節状態空間が指数関数的に増加することを考えると、単一エージェントのオプション発見に依然として依存している既存の研究は法外なものになるか、関節状態空間の接続性を改善する関節オプションを直接発見することができません。
この論文では、分解の容易さを享受しながら、協力的な探索行動でマルチエージェントのオプションを直接計算する方法を示します。
私たちの重要なアイデアは、関節状態空間をクロネッカーグラフとして近似することであり、これに基づいて、個々のエージェントの遷移グラフのラプラシアンスペクトルを使用してそのフィードラーベクトルを直接推定できます。
さらに、無限スケールの状態空間をもつタスクではラプラシアンスペクトルを直接計算するのが難しいことを考慮して、NN ベースの表現学習手法を通じて固有関数を推定することによって、私たちの方法を深層学習に拡張することをさらに提案します。
Mujoco のようなシミュレーターを使用して構築されたマルチエージェントタスクの評価では、提案されたアルゴリズムがマルチエージェントのオプションを適切に識別でき、最先端のアルゴリズムを大幅に上回ることが示されています。
コードは https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP で入手できます。

要約(オリジナル)

Covering skill (a.k.a., option) discovery has been developed to improve the exploration of RL in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. Given that joint state space grows exponentially with the number of agents in multi-agent systems, existing researches still relying on single-agent option discovery either become prohibitive or fail to directly discover joint options that improve the connectivity of the joint state space. In this paper, we show how to directly compute multi-agent options with collaborative exploratory behaviors while still enjoying the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector using the Laplacian spectrum of individual agents’ transition graphs. Further, considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method by estimating eigenfunctions through NN-based representation learning techniques. The evaluation on multi-agent tasks built with simulators like Mujoco, shows that the proposed algorithm can successfully identify multi-agent options, and significantly outperforms the state-of-the-art. Codes are available at: https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP.

arxiv情報

著者	Jiayu Chen,Jingdi Chen,Tian Lan,Vaneet Aggarwal
発行日	2023-07-21 14:53:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scalable Multi-agent Skill Discovery based on Kronecker Graphs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー