RESPRECT: Speeding-up Multi-fingered Grasping with Residual Reinforcement Learning

要約

深層強化学習（DRL）は、ロボットグリッパーを使用した制御ポリシーの学習には効果的であることが証明されていますが、問題の次元が高いため、器用な手で掴むという問題、特に実際のロボットプラットフォーム上での把持の問題を解決するにはあまり実用的ではありません。
この研究では、iCub ヒューマノイドの擬人化された手を使用した複数の指で物をつかむタスクに焦点を当てます。
我々は、大規模なオブジェクトのセットで事前訓練されたポリシーから開始して、新しいオブジェクトを部分的に ($\sim 5 \times$ 高速に) 把握するための残差ポリシーを学習できる、RESPRECT (RESidual learning with PREtrained CriTics) 手法を提案します。
) タスクのデモンストレーションを必要とせずに、ポリシーを最初からトレーニングするために必要なタイムステップ。
私たちの知る限り、これは、DRL で事前トレーニングされた別のポリシーの上に残差ポリシーを学習する最初の残差強化学習 (RRL) アプローチです。
残留学習中に事前トレーニングされたポリシーのいくつかのコンポーネントを利用して、トレーニングをさらに高速化します。
iCub シミュレート環境で結果をベンチマークし、実際の iCub ロボットで複数の指による把握ポリシーを学習するために REPRECT を効果的に使用できることを示します。
実験を再現するコードは論文とともにオープンソースライセンスで公開されています。

要約(オリジナル)

Deep Reinforcement Learning (DRL) has proven effective in learning control policies using robotic grippers, but much less practical for solving the problem of grasping with dexterous hands — especially on real robotic platforms — due to the high dimensionality of the problem. In this work, we focus on the multi-fingered grasping task with the anthropomorphic hand of the iCub humanoid. We propose the RESidual learning with PREtrained CriTics (RESPRECT) method that, starting from a policy pre-trained on a large set of objects, can learn a residual policy to grasp a novel object in a fraction ($\sim 5 \times$ faster) of the timesteps required to train a policy from scratch, without requiring any task demonstration. To our knowledge, this is the first Residual Reinforcement Learning (RRL) approach that learns a residual policy on top of another policy pre-trained with DRL. We exploit some components of the pre-trained policy during residual learning that further speed-up the training. We benchmark our results in the iCub simulated environment, and we show that RESPRECT can be effectively used to learn a multi-fingered grasping policy on the real iCub robot. The code to reproduce the experiments is released together with the paper with an open source license.

arxiv情報

著者	Federico Ceola,Lorenzo Rosasco,Lorenzo Natale
発行日	2024-01-26 13:38:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RESPRECT: Speeding-up Multi-fingered Grasping with Residual Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー