Latent Plans for Task-Agnostic Offline Reinforcement Learning

要約

長期にわたる日常のタスクは、複数の暗黙的なサブタスクのシーケンスで構成されており、オフラインのロボット制御では依然として大きな課題となっています。
模倣とオフラインの強化学習のバリエーションを使用してこの設定に対処することを目的とした多くの以前の方法がありましたが、学習された動作は通常狭く、構成可能な長期的な目標を達成するのに苦労することがよくあります。
両方のパラダイムには補完的な長所と短所があるため、両方の方法の長所を組み合わせて、高次元のカメラ観察からタスクに依存しない長期的なポリシーを学習する新しい階層的アプローチを提案します。
具体的には、模倣学習によって潜在スキルを学習する低レベルのポリシーと、潜在的な行動の優先順位をスキルチェーン化するためのオフライン強化学習から学習した高レベルのポリシーを組み合わせます。
さまざまなシミュレートされた実際のロボット制御タスクでの実験は、私たちの定式化により、目標チェーンを通じて潜在的なスキルを「縫い合わせる」ことで、時間的に拡張された目標に到達するための以前には見られなかったスキルの組み合わせを生成し、状態のパフォーマンスを大幅に改善できることを示しています。
アートベースライン。
私たちは、現実の世界で 25 の異なる操作タスクに対して 1 つのマルチタスク視覚運動ポリシーを学習します。これは、模倣学習とオフラインの強化学習手法の両方よりも優れています。

要約(オリジナル)

Everyday tasks of long-horizon and comprising a sequence of multiple implicit subtasks still impose a major challenge in offline robot control. While a number of prior methods aimed to address this setting with variants of imitation and offline reinforcement learning, the learned behavior is typically narrow and often struggles to reach configurable long-horizon goals. As both paradigms have complementary strengths and weaknesses, we propose a novel hierarchical approach that combines the strengths of both methods to learn task-agnostic long-horizon policies from high-dimensional camera observations. Concretely, we combine a low-level policy that learns latent skills via imitation learning and a high-level policy learned from offline reinforcement learning for skill-chaining the latent behavior priors. Experiments in various simulated and real robot control tasks show that our formulation enables producing previously unseen combinations of skills to reach temporally extended goals by ‘stitching’ together latent skills through goal chaining with an order-of-magnitude improvement in performance upon state-of-the-art baselines. We even learn one multi-task visuomotor policy for 25 distinct manipulation tasks in the real world which outperforms both imitation learning and offline reinforcement learning techniques.

arxiv情報

著者	Erick Rosete-Beas,Oier Mees,Gabriel Kalweit,Joschka Boedecker,Wolfram Burgard
発行日	2022-09-19 12:27:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Latent Plans for Task-Agnostic Offline Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー