Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking

要約

人型ロボットに必要な能力は、自然の外乱を拒否しながら立ったり歩いたりする能力です。
最近の進歩は、シミュレーションからリアルへの強化学習 (RL) を使用してこのような移動コントローラーを訓練することであり、アプローチは主に報酬関数が異なります。
しかし、これまでの研究には、新しい報酬関数を体系的にテストし、反復可能な実験を通じてコントローラーのパフォーマンスを比較するための明確な方法が不足していました。
これにより、アプローチ間のトレードオフに関する理解が制限され、進歩が妨げられます。
これに対処するために、コマンド追従、外乱回復、エネルギー効率などの指標に関して、立位および歩行 (SaW) コントローラーの現実世界のパフォーマンスを評価および比較する、低コストの定量的なベンチマーク方法を提案します。
また、報酬関数の設計を見直し、SaW コントローラーをトレーニングするための最小限の制約を持つ報酬関数を構築します。
私たちは、ベンチマークフレームワークが改善領域を特定し、体系的に対処してポリシーを強化できることを実験的に検証します。
また、新しいコントローラーを Digit ヒューマノイドロボットの最先端のコントローラーと比較します。
結果は、コントローラー間の明確な定量的なトレードオフを提供し、報酬関数の将来の改善とベンチマークの拡張の方向性を示唆します。

要約(オリジナル)

A necessary capability for humanoid robots is the ability to stand and walk while rejecting natural disturbances. Recent progress has been made using sim-to-real reinforcement learning (RL) to train such locomotion controllers, with approaches differing mainly in their reward functions. However, prior works lack a clear method to systematically test new reward functions and compare controller performance through repeatable experiments. This limits our understanding of the trade-offs between approaches and hinders progress. To address this, we propose a low-cost, quantitative benchmarking method to evaluate and compare the real-world performance of standing and walking (SaW) controllers on metrics like command following, disturbance recovery, and energy efficiency. We also revisit reward function design and construct a minimally constraining reward function to train SaW controllers. We experimentally verify that our benchmarking framework can identify areas for improvement, which can be systematically addressed to enhance the policies. We also compare our new controller to state-of-the-art controllers on the Digit humanoid robot. The results provide clear quantitative trade-offs among the controllers and suggest directions for future improvements to the reward functions and expansion of the benchmarks.

arxiv情報

著者	Bart van Marum,Aayam Shrestha,Helei Duan,Pranay Dugar,Jeremy Dao,Alan Fern
発行日	2024-08-30 08:44:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー