STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion

要約

ヒューマノイドロボットは、人工知能に大きな課題を提示し、高度の高度化システムの正確な調整と制御を必要とします。
このドメインにおける深い強化学習（DRL）の効果的な報酬機能の設計は、依然として重要なボトルネックであり、広範な手動の努力、ドメインの専門知識、反復改良を要求しています。
これらの課題を克服するために、Humanoid Robot Mocomotionタスクの報酬設計、DRLトレーニング、フィードバックの最適化を自動化するために、エージェントエンジニアリングに基づいた新しいフレームワークであるStrideを紹介します。
エージェントエンジニアリングの構造化された原理を、コードライティング、ゼロショット生成、およびコンテキスト内最適化のための大規模な言語モデル（LLMS）と組み合わせることにより、ストライドは、タスク固有のプロンプトやテンプレートに頼らずに報酬機能を生成、評価、および繰り返し洗練します。
。
ヒューマノイドロボットの形態を特徴とする多様な環境にわたって、Strideは最先端の報酬設計フレームワークEurekaを上回り、効率とタスクのパフォーマンスの大幅な改善を達成します。
ストライド生成の報酬を使用して、シミュレートされたヒューマノイドロボットは、複雑な地形でスプリントレベルの移動を実現し、DRLワークフローとヒューマノイドロボット研究を進める能力を強調します。

要約(オリジナル)

Humanoid robotics presents significant challenges in artificial intelligence, requiring precise coordination and control of high-degree-of-freedom systems. Designing effective reward functions for deep reinforcement learning (DRL) in this domain remains a critical bottleneck, demanding extensive manual effort, domain expertise, and iterative refinement. To overcome these challenges, we introduce STRIDE, a novel framework built on agentic engineering to automate reward design, DRL training, and feedback optimization for humanoid robot locomotion tasks. By combining the structured principles of agentic engineering with large language models (LLMs) for code-writing, zero-shot generation, and in-context optimization, STRIDE generates, evaluates, and iteratively refines reward functions without relying on task-specific prompts or templates. Across diverse environments featuring humanoid robot morphologies, STRIDE outperforms the state-of-the-art reward design framework EUREKA, achieving significant improvements in efficiency and task performance. Using STRIDE-generated rewards, simulated humanoid robots achieve sprint-level locomotion across complex terrains, highlighting its ability to advance DRL workflows and humanoid robotics research.

arxiv情報

著者	Zhenwei Wu,Jinxiong Lu,Yuxiao Chen,Yunxin Liu,Yueting Zhuang,Luhui Hu
発行日	2025-02-10 13:52:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー