DecAP: Decaying Action Priors for Accelerated Learning of Torque-Based Legged Locomotion Policies

要約

脚式ロボットの最適制御は、位置ベースの制御からトルクベースの制御へのパラダイムシフトを経ています。これは、後者の順守性と堅牢性のおかげです。
この変化と並行して、コミュニティは、現実の複雑なタスクの移動ポリシーを直接学習するための有望なアプローチとして、深層強化学習 (DRL) にも注目しています。
ただし、ほとんどのエンドツーエンド DRL アプローチは依然として位置空間で動作します。これは主に、トルク空間での学習はサンプル効率が悪く、一貫して自然な歩行に収束しないためです。
これらの課題に対処するために、私たちは 2 段階のフレームワークを提案します。
最初の段階では、位置ベースのポリシーをトレーニングすることで独自の模倣データを生成し、最適なコントローラーを設計するための専門知識の必要性を排除します。
第 2 段階では、模倣報酬を利用してトルクベースのポリシーの探索を強化する新しい方法である、減衰アクション事前分布が組み込まれています。
私たちのアプローチは、模倣学習単独よりも一貫して優れており、これらの報酬を 0.1 倍から 10 倍まで拡張するのに堅牢であることを示します。
さらに、トレーニング中の外乱の形でドメインのランダム化を行わずに、四足動物 (Unitree Go1) で位置ベースのポリシーのロバスト性を位置支援トルクベースのポリシーと比較することで、トルク制御の利点を検証します。

要約(オリジナル)

Optimal Control for legged robots has gone through a paradigm shift from position-based to torque-based control, owing to the latter’s compliant and robust nature. In parallel to this shift, the community has also turned to Deep Reinforcement Learning (DRL) as a promising approach to directly learn locomotion policies for complex real-life tasks. However, most end-to-end DRL approaches still operate in position space, mainly because learning in torque space is often sample-inefficient and does not consistently converge to natural gaits. To address these challenges, we propose a two-stage framework. In the first stage, we generate our own imitation data by training a position-based policy, eliminating the need for expert knowledge to design optimal controllers. The second stage incorporates decaying action priors, a novel method to enhance the exploration of torque-based policies aided by imitation rewards. We show that our approach consistently outperforms imitation learning alone and is robust to scaling these rewards from 0.1x to 10x. We further validate the benefits of torque control by comparing the robustness of a position-based policy to a position-assisted torque-based policy on a quadruped (Unitree Go1) without any domain randomization in the form of external disturbances during training.

arxiv情報

著者	Shivam Sood,Ge Sun,Peizhuo Li,Guillaume Sartoretti
発行日	2024-03-31 16:01:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DecAP: Decaying Action Priors for Accelerated Learning of Torque-Based Legged Locomotion Policies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー