Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

要約

ここでは、数十台の移動ロボットと人間のピッキング作業者が連携して倉庫内で商品の収集と配送を行う倉庫を考えます。
注文ピッキング問題と呼ばれる、私たちが取り組む基本的な問題は、このタスクのパフォーマンスを最大化するために、これらのワーカーエージェントが倉庫内での移動とアクションをどのように調整する必要があるかということです。
ヒューリスティックなアプローチを使用する確立された業界手法では、本質的に変化する倉庫構成を最適化するために大規模なエンジニアリング作業が必要です。
対照的に、マルチエージェント強化学習 (MARL) は、多様な倉庫構成 (例: サイズ、レイアウト、作業員の数/種類、品目の補充頻度)、およびさまざまなタイプの注文ピッキングパラダイム (例: 商品から商品へ) に柔軟に適用できます。
エージェントは経験を通じて最適に協力する方法を学ぶことができるため、人および人と商品）。
私たちは、マネージャーエージェントがワーカーエージェントに目標を割り当て、グローバル目標 (ピック率など) の最大化に向けてマネージャーとワーカーのポリシーを共同トレーニングする階層型 MARL アルゴリズムを開発します。
当社の階層型アルゴリズムは、ベースライン MARL アルゴリズムよりもサンプル効率が大幅に向上し、さまざまな倉庫構成やさまざまな注文ピッキングパラダイムにおいて、複数の確立された業界ヒューリスティックよりも全体的なピッキング率が大幅に向上します。

要約(オリジナル)

We consider a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance in this task. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), and different types of order-picking paradigms (e.g. Goods-to-Person and Person-to-Goods), as the agents can learn how to cooperate optimally through experience. We develop hierarchical MARL algorithms in which a manager agent assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency over baseline MARL algorithms and overall pick rates over multiple established industry heuristics in a diverse set of warehouse configurations and different order-picking paradigms.

arxiv情報

著者	Aleksandar Krnjaic,Raul D. Steleac,Jonathan D. Thomas,Georgios Papoudakis,Lukas Schäfer,Andrew Wing Keung To,Kuan-Ho Lao,Murat Cubuktepe,Matthew Haley,Peter Börsting,Stefano V. Albrecht
発行日	2024-08-30 14:07:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー