Learning Category-Level Generalizable Object Manipulation Policy via Generative Adversarial Self-Imitation Learning from Demonstrations

要約

一般化可能なオブジェクト操作スキルは、インテリジェントで多機能なロボットが現実世界の複雑なシーンで機能するために不可欠です。
強化学習の最近の進歩にもかかわらず、幾何学的に多様な多関節オブジェクトのカテゴリを処理できる一般化可能な操作ポリシーを学習することは依然として非常に困難です。
この作業では、タスクにとらわれない方法で模倣学習を介して、このカテゴリレベルのオブジェクト操作ポリシー学習問題に取り組みます。ここでは、手作りの密な報酬はなく、最終的な報酬のみを想定しています。
この斬新で挑戦的な一般化可能なポリシー学習の問題を考えると、以前の模倣学習アルゴリズムを失敗させ、目に見えないインスタンスへの一般化を妨げる可能性のあるいくつかの重要な問題を特定します。
次に、デモンストレーションからの生成的な敵対的自己模倣学習、ディスクリミネーターの漸進的成長、エキスパートバッファーのインスタンスバランシングなど、いくつかの一般的ではあるが重要な手法を提案します。
タスク。
ManiSkill ベンチマークでの実験では、すべてのタスクで顕著な改善が見られ、アブレーション研究では、提案された各手法の貢献がさらに検証されました。

要約(オリジナル)

Generalizable object manipulation skills are critical for intelligent and multi-functional robots to work in real-world complex scenes. Despite the recent progress in reinforcement learning, it is still very challenging to learn a generalizable manipulation policy that can handle a category of geometrically diverse articulated objects. In this work, we tackle this category-level object manipulation policy learning problem via imitation learning in a task-agnostic manner, where we assume no handcrafted dense rewards but only a terminal reward. Given this novel and challenging generalizable policy learning problem, we identify several key issues that can fail the previous imitation learning algorithms and hinder the generalization to unseen instances. We then propose several general but critical techniques, including generative adversarial self-imitation learning from demonstrations, progressive growing of discriminator, and instance-balancing for expert buffer, that accurately pinpoints and tackles these issues and can benefit category-level manipulation policy learning regardless of the tasks. Our experiments on ManiSkill benchmarks demonstrate a remarkable improvement on all tasks and our ablation studies further validate the contribution of each proposed technique.

arxiv情報

著者	Hao Shen,Weikang Wan,He Wang
発行日	2022-09-13 14:59:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Category-Level Generalizable Object Manipulation Policy via Generative Adversarial Self-Imitation Learning from Demonstrations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー