Watch and Match: Supercharging Imitation with Regularized Optimal Transport

要約

模倣学習は、複雑な意思決定の問題に対してポリシーを効率的に学習する上で大きな可能性を秘めています。
現在の最先端のアルゴリズムでは、多くの場合、逆強化学習 (IRL) が使用されます。この場合、一連の専門家のデモンストレーションが与えられると、エージェントは代わりに報酬関数と関連する最適なポリシーを推論します。
ただし、このような IRL アプローチでは、複雑な制御問題を処理するためにかなりのオンライン操作が必要になることがよくあります。
この作業では、Regularized Optimal Transport (ROT) を提示します。これは、最適なトランスポートベースの軌道マッチングにおける最近の進歩に基づいて構築された新しい模倣学習アルゴリズムです。
私たちの重要な技術的洞察は、軌道マッチング報酬と行動クローニングを適応的に組み合わせることで、ほんの数回のデモでも模倣を大幅に加速できるということです。
DeepMind Control Suite、OpenAI Robotics Suite、および Meta-World Benchmark にわたる 20 の視覚制御タスクに関する実験では、従来の最先端の方法と比較して、平均 7.8 倍高速な模倣がエキスパートのパフォーマンスの 90% に達することを示しています。
.
実際のロボット操作では、1 つのデモンストレーションと 1 時間のオンライントレーニングだけで、ROT は 14 のタスクで平均 90.1% の成功率を達成します。

要約(オリジナル)

Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8X faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks.

arxiv情報

著者	Siddhant Haldar,Vaibhav Mathur,Denis Yarats,Lerrel Pinto
発行日	2023-02-20 20:54:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー