Imitation Learning based Alternative Multi-Agent Proximal Policy Optimization for Well-Formed Swarm-Oriented Pursuit Avoidance

要約

マルチロボットシステム (MRS) は幅広い研究の関心を集め、特に協調制御分野で非常に興味深い応用を促進しています。
しかし、追跡回避のための分散型大規模MRSにおける形成、監視、防御の複合能力についてはほとんど解明されておらず、調整能力と適応能力に厳しい要件が課せられている。
この論文では、整形式の群れで追跡回避タスクを実行するための柔軟で通信経済的なソリューションを提供する、分散型模倣学習ベースの代替マルチエージェント近接ポリシー最適化 (IA-MAPPO) アルゴリズムを提案しました。
特に、ポリシー蒸留ベースの MAPPO エグゼキューターは、集中的な方法で複数のフォーメーションを効率的に実行し、迅速に切り替えるために初めて考案されました。
さらに、模倣学習を活用して編隊コントローラーを分散化し、通信オーバーヘッドを削減し、拡張性を高めます。
その後、分散化によって生じたパフォーマンスの損失を補うために、代替トレーニングが活用されます。
シミュレーション結果は IA-MAPPO の有効性を検証し、広範なアブレーション実験により、通信オーバーヘッドが大幅に削減された集中型ソリューションに匹敵するパフォーマンスがさらに示されました。

要約(オリジナル)

Multi-Robot System (MRS) has garnered widespread research interest and fostered tremendous interesting applications, especially in cooperative control fields. Yet little light has been shed on the compound ability of formation, monitoring and defence in decentralized large-scale MRS for pursuit avoidance, which puts stringent requirements on the capability of coordination and adaptability. In this paper, we put forward a decentralized Imitation learning based Alternative Multi-Agent Proximal Policy Optimization (IA-MAPPO) algorithm to provide a flexible and communication-economic solution to execute the pursuit avoidance task in well-formed swarm. In particular, a policy-distillation based MAPPO executor is firstly devised to capably accomplish and swiftly switch between multiple formations in a centralized manner. Furthermore, we utilize imitation learning to decentralize the formation controller, so as to reduce the communication overheads and enhance the scalability. Afterwards, alternative training is leveraged to compensate the performance loss incurred by decentralization. The simulation results validate the effectiveness of IA-MAPPO and extensive ablation experiments further show the performance comparable to a centralized solution with significant decrease in communication overheads.

arxiv情報

著者	Sizhao Li,Yuming Xiang,Rongpeng Li,Zhifeng Zhao,Honggang Zhang
発行日	2023-11-06 06:58:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Imitation Learning based Alternative Multi-Agent Proximal Policy Optimization for Well-Formed Swarm-Oriented Pursuit Avoidance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー