Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

要約

倉庫システムの効率を向上させ、顧客の膨大な注文に応えるために、ロボットモバイルフルフィルメントシステム（RMFS）のハイパースケールマルチロボットタスクプランニング（MRTP）におけるディメンション災害と動的特性の課題を解決することを目指しています。
既存の研究では、階層強化学習 (HRL) がこれらの課題を軽減する効果的な方法であることが示されています。
それに基づいて、RMFS でのハイパースケール MRTP 用の効率的な多段階 HRL ベースのマルチロボットタスクプランナーを構築し、計画プロセスは特別な時間グラフトポロジで表されます。
最適性を確保するために、プランナーは集中型アーキテクチャで設計されていますが、さまざまな未学習のスケールやマップのパフォーマンスを維持するためのポリシーを必要とするスケールアップと一般化という課題も伴います。
これらの困難に対処するために、私たちはまず階層型時間的注意ネットワーク（HTAN）を構築して不定長の入力を処理する基本的な能力を確保し、次に階層型ポリシー学習のための多段階カリキュラムを設計して、壊滅的な問題を回避しながらスケールアップと汎化能力をさらに向上させます。
忘れること。
さらに、階層構造を持つポリシーは、マルチエージェント強化学習と同様の不公平なクレジット割り当てに悩まされることに気づきました。これにヒントを得て、学習パフォーマンスを向上させるために、反事実的なロールアウトベースラインを備えた階層型強化学習アルゴリズムを提案します。
実験結果は、シミュレートされた RMFS と実際の RMFS の両方で、さまざまな MRTP インスタンス上で、当社のプランナーが他の最先端の方法よりも優れたパフォーマンスを発揮することを示しています。
また、当社のプランナーは、他の方法よりも優れたパフォーマンスを維持しながら、未学習マップ上で最大 200 台のロボットと 1000 台の検索ラックを備えた RMFS のハイパースケール MRTP インスタンスに正常にスケールアップできます。

要約(オリジナル)

To improve the efficiency of warehousing system and meet huge customer orders, we aim to solve the challenges of dimension disaster and dynamic properties in hyper scale multi-robot task planning (MRTP) for robotic mobile fulfillment system (RMFS). Existing research indicates that hierarchical reinforcement learning (HRL) is an effective method to reduce these challenges. Based on that, we construct an efficient multi-stage HRL-based multi-robot task planner for hyper scale MRTP in RMFS, and the planning process is represented with a special temporal graph topology. To ensure optimality, the planner is designed with a centralized architecture, but it also brings the challenges of scaling up and generalization that require policies to maintain performance for various unlearned scales and maps. To tackle these difficulties, we first construct a hierarchical temporal attention network (HTAN) to ensure basic ability of handling inputs with unfixed lengths, and then design multi-stage curricula for hierarchical policy learning to further improve the scaling up and generalization ability while avoiding catastrophic forgetting. Additionally, we notice that policies with hierarchical structure suffer from unfair credit assignment that is similar to that in multi-agent reinforcement learning, inspired of which, we propose a hierarchical reinforcement learning algorithm with counterfactual rollout baseline to improve learning performance. Experimental results demonstrate that our planner outperform other state-of-the-art methods on various MRTP instances in both simulated and real-world RMFS. Also, our planner can successfully scale up to hyper scale MRTP instances in RMFS with up to 200 robots and 1000 retrieval racks on unlearned maps while keeping superior performance over other methods.

arxiv情報

著者	Xuan Zhou,Xiang Shi,Lele Zhang,Chen Chen,Hongbo Li,Lin Ma,Fang Deng,Jie Chen
発行日	2024-12-27 09:07:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー