Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation

要約

GPUベースの並列シミュレーションの最近の進歩により、開業医は大量のデータを収集し、コモディティGPUでディープ補強学習（RL）を使用して複雑な制御ポリシーを訓練することができました。
ただし、ロボット工学におけるRLのこのような成功は、高速剛体ダイナミクスによって十分にシミュレートされたタスクに限定されています。
ソフトボディのシミュレーション技術は比較的数桁遅く、サンプルの複雑さの要件によりRLの使用が制限されます。
この課題に対処するために、このペーパーでは、剛体と変形を含むタスクでRLをスケーリングできるように、新しいRLアルゴリズムとシミュレーションプラットフォームの両方を紹介します。
ソフト分析ポリシー最適化（SAPO）を紹介します。これは、最大エントロピー1次モデルベースのアクターcritic RLアルゴリズムを紹介します。これは、微分可能なシミュレーションから1次分析勾配を使用して確率的アクターを訓練し、期待されるリターンとエントロピーを最大化します。
私たちのアプローチに加えて、私たちは、剛体を超えたさまざまな材料のシミュレーションをサポートする並行する微分微分多目的シミュレーションプラットフォームであるRewarpedを開発します。
挑戦的な操作と移動タスクの再埋め立てを再実装し、SAPOが剛体、明確性、変形の間の相互作用を伴うさまざまなタスクよりもベースラインを上回ることを示しています。
https://rewarped.github.io/の詳細については。

要約(オリジナル)

Recent advances in GPU-based parallel simulation have enabled practitioners to collect large amounts of data and train complex control policies using deep reinforcement learning (RL), on commodity GPUs. However, such successes for RL in robotics have been limited to tasks sufficiently simulated by fast rigid-body dynamics. Simulation techniques for soft bodies are comparatively several orders of magnitude slower, thereby limiting the use of RL due to sample complexity requirements. To address this challenge, this paper presents both a novel RL algorithm and a simulation platform to enable scaling RL on tasks involving rigid bodies and deformables. We introduce Soft Analytic Policy Optimization (SAPO), a maximum entropy first-order model-based actor-critic RL algorithm, which uses first-order analytic gradients from differentiable simulation to train a stochastic actor to maximize expected return and entropy. Alongside our approach, we develop Rewarped, a parallel differentiable multiphysics simulation platform that supports simulating various materials beyond rigid bodies. We re-implement challenging manipulation and locomotion tasks in Rewarped, and show that SAPO outperforms baselines over a range of tasks that involve interaction between rigid bodies, articulations, and deformables. Additional details at https://rewarped.github.io/.

arxiv情報

著者	Eliot Xing,Vernon Luk,Jean Oh
発行日	2025-02-27 19:05:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー