CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning

要約

深層強化学習 (RL) は、四足歩行などの複雑なロボットタスクの解決において目覚ましい結果を実証しました。
しかし、現在のソルバーは、ハード制約を考慮した効率的なポリシーを生成できません。
この研究では、制約をロボット学習に統合することを提唱し、新しい制約付き RL アルゴリズムである Constraints as Terminations (CaT) を提示します。
古典的な制約付き RL 定式化から離れて、ポリシー学習中の確率的終了を通じて制約を再定式化します。制約に違反すると、RL エージェントが獲得できる可能性のある将来の報酬が終了する確率が引き起こされます。
我々は、ロボット学習で広く使用されている既製の RL アルゴリズム (近接ポリシー最適化など) を最小限に変更することにより、この定式化に対するアルゴリズムアプローチを提案します。
私たちのアプローチは、過度の複雑さや計算オーバーヘッドを導入することなく優れた制約遵守を実現し、広範な採用に対する障壁を軽減します。
実際の四足歩行ロボット Solo が困難な障害物を通過する際の経験的評価を通じて、CaT が RL フレームワークに制約を組み込むための魅力的なソリューションを提供することを実証します。
ビデオとコードは https://constraints-as-terminations.github.io で入手できます。

要約(オリジナル)

Deep Reinforcement Learning (RL) has demonstrated impressive results in solving complex robotic tasks such as quadruped locomotion. Yet, current solvers fail to produce efficient policies respecting hard constraints. In this work, we advocate for integrating constraints into robot learning and present Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing from classical constrained RL formulations, we reformulate constraints through stochastic terminations during policy learning: any violation of a constraint triggers a probability of terminating potential future rewards the RL agent could attain. We propose an algorithmic approach to this formulation, by minimally modifying widely used off-the-shelf RL algorithms in robot learning (such as Proximal Policy Optimization). Our approach leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption. Through empirical evaluation on the real quadruped robot Solo crossing challenging obstacles, we demonstrate that CaT provides a compelling solution for incorporating constraints into RL frameworks. Videos and code are available at https://constraints-as-terminations.github.io.

arxiv情報

著者	Elliot Chane-Sane,Pierre-Alexandre Leziart,Thomas Flayols,Olivier Stasse,Philippe Souères,Nicolas Mansard
発行日	2024-03-27 17:03:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー