Reward Training Wheels: Adaptive Auxiliary Rewards for Robotics Reinforcement Learning

要約

Robotics Rewnection Learning（RL）は、多くの場合、慎重に設計された補助報酬に依存して、まばらな一次学習目標を補完して、大規模で現実世界の試行錯誤データの欠如を補正します。
これらの補助報酬は学習を加速しますが、重要なエンジニアリングの努力が必要であり、人間のバイアスを導入し、トレーニング中にロボットの進化する能力に適応することはできません。
このホワイトペーパーでは、Robotics RLの補助報酬適応を自動化する教師と学生のフレームワークである報酬トレーニングホイール（RTW）を紹介します。
具体的には、RTWの教師は、生徒の進化する機能に基づいて補助報酬の重みを動的に調整し、どの補助報酬の側面が主要な目的を改善するために多かれ少なかれ重点を置く必要があるかを判断します。
2つの挑戦的なロボットタスクでRTWを示します。非常に制約されたスペースでのナビゲーションと、垂直に挑戦する地形でのオフロード車の移動度です。
シミュレーションでは、RTWは、ナビゲーションの成功率でエキスパートが設計した報酬を2.35％上回り、オフロードモビリティのパフォーマンスを122.62％上回り、それぞれ35％と3倍のトレーニング効率を達成します。
物理的なロボット実験では、RTWの有効性をさらに検証し、完全な成功率（5/5試験対専門家が設計した報酬の2/5）を達成し、方向角度が最大47.4％減少して車両の安定性を改善します。

要約(オリジナル)

Robotics Reinforcement Learning (RL) often relies on carefully engineered auxiliary rewards to supplement sparse primary learning objectives to compensate for the lack of large-scale, real-world, trial-and-error data. While these auxiliary rewards accelerate learning, they require significant engineering effort, may introduce human biases, and cannot adapt to the robot’s evolving capabilities during training. In this paper, we introduce Reward Training Wheels (RTW), a teacher-student framework that automates auxiliary reward adaptation for robotics RL. To be specific, the RTW teacher dynamically adjusts auxiliary reward weights based on the student’s evolving capabilities to determine which auxiliary reward aspects require more or less emphasis to improve the primary objective. We demonstrate RTW on two challenging robot tasks: navigation in highly constrained spaces and off-road vehicle mobility on vertically challenging terrain. In simulation, RTW outperforms expert-designed rewards by 2.35% in navigation success rate and improves off-road mobility performance by 122.62%, while achieving 35% and 3X faster training efficiency, respectively. Physical robot experiments further validate RTW’s effectiveness, achieving a perfect success rate (5/5 trials vs. 2/5 for expert-designed rewards) and improving vehicle stability with up to 47.4% reduction in orientation angles.

arxiv情報

著者	Linji Wang,Tong Xu,Yuanjie Lu,Xuesu Xiao
発行日	2025-03-19 22:45:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reward Training Wheels: Adaptive Auxiliary Rewards for Robotics Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー