Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning

要約

強化学習には多くの場合、広範なトレーニングデータが必要です。
シミュレーションから現実への転送は、ロボット工学におけるこの課題に対処するための有望なアプローチを提供します。
微分可能シミュレーターは正確な勾配によってサンプル効率を向上させますが、接触が多い環境では不安定になる可能性があり、一般化が不十分になる可能性があります。
この論文では、シャープネスを意識した最適化を勾配ベースの強化学習アルゴリズムに統合する新しいアプローチを紹介します。
私たちのシミュレーション結果は、接触の多い環境でテストされた私たちの方法が、一次法のサンプル効率を維持しながら、環境の変動や行動の摂動に対するポリシーのロバスト性を大幅に強化することを示しています。
具体的には、私たちのアプローチは、標準的な 1 次手法と比較してアクションノイズ耐性を向上させ、0 次手法に匹敵する一般化を実現します。
この改善は、より良い一般化に関連して、損失状況のより平坦な最小値を見つけることに由来しています。
私たちの研究は、ロボット工学における効率的な学習と堅牢なシミュレーションから現実への移行のバランスをとる有望なソリューションを提供し、シミュレーションと現実世界のパフォーマンスの間のギャップを埋める可能性があります。

要約(オリジナル)

Reinforcement learning often requires extensive training data. Simulation-to-real transfer offers a promising approach to address this challenge in robotics. While differentiable simulators offer improved sample efficiency through exact gradients, they can be unstable in contact-rich environments and may lead to poor generalization. This paper introduces a novel approach integrating sharpness-aware optimization into gradient-based reinforcement learning algorithms. Our simulation results demonstrate that our method, tested on contact-rich environments, significantly enhances policy robustness to environmental variations and action perturbations while maintaining the sample efficiency of first-order methods. Specifically, our approach improves action noise tolerance compared to standard first-order methods and achieves generalization comparable to zeroth-order methods. This improvement stems from finding flatter minima in the loss landscape, associated with better generalization. Our work offers a promising solution to balance efficient learning and robust sim-to-real transfer in robotics, potentially bridging the gap between simulation and real-world performance.

arxiv情報

著者	Severin Bochem,Eduardo Gonzalez-Sanchez,Yves Bicker,Gabriele Fadini
発行日	2024-11-29 14:25:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー