DecAP: Decaying Action Priors for Accelerated Imitation Learning of Torque-Based Legged Locomotion Policies

要約

脚式ロボットの最適制御は、位置ベースからトルクベースへのパラダイムシフトを経てきました。このシフトと並行して、コミュニティは複雑な実タスクのロコモーション方針を直接学習する有望なアプローチとして、深層強化学習（Deep Reinforcement Learning：DRL）にも注目しています。しかし、ほとんどのエンドツーエンドのDRLアプローチは依然として位置空間で動作しており、主にトルク空間での学習はサンプル効率が悪く、自然な歩行に一貫して収束しないことが多いからです。このような課題に対処するため、我々は2段階のフレームワークを提案します。第一段階では、位置ベースのポリシーを学習することにより、独自の模倣データを生成し、最適なコントローラを設計するための専門家の知識を不要にする。第2段階では、模倣の報酬によって支援されるトルクベースのポリシーの探索を強化する新しい手法である減衰行動プリオールを組み込む。本アプローチは、模倣学習のみを一貫して凌駕し、これらの報酬を0.1倍から10倍に拡大してもロバストであることを示す。さらに、四足歩行動物（Unitree Go1）において、学習中に外乱という形で領域をランダム化することなく、位置ベースの方針と位置アシストトルクベースの方針の頑健性を比較することで、トルク制御の利点を検証する。

要約(オリジナル)

Optimal Control for legged robots has gone through a paradigm shift from position-based to torque-based control, owing to the latter’s compliant and robust nature. In parallel to this shift, the community has also turned to Deep Reinforcement Learning (DRL) as a promising approach to directly learn locomotion policies for complex real-life tasks. However, most end-to-end DRL approaches still operate in position space, mainly because learning in torque space is often sample-inefficient and does not consistently converge to natural gaits. To address these challenges, we propose a two-stage framework. In the first stage, we generate our own imitation data by training a position-based policy, eliminating the need for expert knowledge to design optimal controllers. The second stage incorporates decaying action priors, a novel method to enhance the exploration of torque-based policies aided by imitation rewards. We show that our approach consistently outperforms imitation learning alone and is robust to scaling these rewards from 0.1x to 10x. We further validate the benefits of torque control by comparing the robustness of a position-based policy to a position-assisted torque-based policy on a quadruped (Unitree Go1) without any domain randomization in the form of external disturbances during training.

arxiv情報

著者	Shivam Sood,Ge Sun,Peizhuo Li,Guillaume Sartoretti
発行日	2024-09-02 08:55:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

DecAP: Decaying Action Priors for Accelerated Imitation Learning of Torque-Based Legged Locomotion Policies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー