High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures

要約

この作業は、ボストンダイナミクススポットでの低レベルのモーターアクセスのためのSpot RL Researcher Development Kitを使用した高性能強化学習ポリシーの展開の背後にある技術的詳細の概要を示しています。
これは、ボストンダイナミクスを通じて利用可能なNVIDIA ISAACLABおよび展開コードを通じて公開されているトレーニングコードを使用して、スポットハードウェアに展開されたエンドツーエンドエンドの強化学習ポリシーの最初の公開デモを表しています。
ワッサースタイン距離と最大平均の不一致を利用して、ハードウェアおよびシミュレーションで収集されたデータの分布の類似性を定量化して、SIM2realギャップを測定します。
これらの測定値は、共分散行列適応進化戦略のスコアリング関数として使用して、スポットから測定が不明または測定が困難なシミュレートされたパラメーターを最適化します。
モデリングとトレーニングの手順は、飛行段階を含む複数の歩行が可能な高品質の強化学習ポリシーを生成します。
5.2msを超える移動、トリプルスポット以上のデフォルトコントローラーの最大速度、滑りやすい表面への堅牢性、外乱の拒絶、および以前は目に見えなかった全体的な俊敏性が可能なポリシーを展開します。
メソッドを詳しく説明し、コードをリリースして、低レベルのAPIで将来の作業をサポートします。

要約(オリジナル)

This work presents an overview of the technical details behind a high performance reinforcement learning policy deployment with the Spot RL Researcher Development Kit for low level motor access on Boston Dynamics Spot. This represents the first public demonstration of an end to end end reinforcement learning policy deployed on Spot hardware with training code publicly available through Nvidia IsaacLab and deployment code available through Boston Dynamics. We utilize Wasserstein Distance and Maximum Mean Discrepancy to quantify the distributional dissimilarity of data collected on hardware and in simulation to measure our sim2real gap. We use these measures as a scoring function for the Covariance Matrix Adaptation Evolution Strategy to optimize simulated parameters that are unknown or difficult to measure from Spot. Our procedure for modeling and training produces high quality reinforcement learning policies capable of multiple gaits, including a flight phase. We deploy policies capable of over 5.2ms locomotion, more than triple Spots default controller maximum speed, robustness to slippery surfaces, disturbance rejection, and overall agility previously unseen on Spot. We detail our method and release our code to support future work on Spot with the low level API.

arxiv情報

著者	AJ Miller,Fangzhou Yu,Michael Brauckmann,Farbod Farshidian
発行日	2025-04-29 13:13:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー