Benchmarking the Full-Order Model Optimization Based Imitation in the Humanoid Robot Reinforcement Learning Walk

要約

深層強化学習を使用して二足歩行ロボットの歩行を開発する場合、基準軌道が使用される場合と使用されない場合があります。
各アプローチには長所と短所があり、方法の選択はコントロール開発者次第です。
この論文では、移動学習とその結果として得られる歩行に対する基準軌道の影響を調査します。
報酬模倣率が異なるフルオーダー擬人化ロボットモデルの 3 つの歩行を実装し、sim 間で制御ポリシーを転送し、ロバスト性とエネルギー効率の観点から歩行を比較しました。
また、人型ロボットの魅力的で自然な歩行を実現することが課題であったため、人へのインタビューによる歩行の定性分析も実施しました。
実験の結果によると、最も成功したアプローチは、エピソードごとの模倣とコマンド速度の遵守に対する報酬の平均値がトレーニング全体を通じてバランスが保たれるアプローチでした。
この方法で得られた歩行は、模倣のみで訓練された歩行（中央値 4.0）と比較して、自然さを維持しています（ユーザー調査によると中央値 3.6）が、基準軌道なしで訓練された歩行に近い堅牢性を保ちます。

要約(オリジナル)

When a gait of a bipedal robot is developed using deep reinforcement learning, reference trajectories may or may not be used. Each approach has its advantages and disadvantages, and the choice of method is up to the control developer. This paper investigates the effect of reference trajectories on locomotion learning and the resulting gaits. We implemented three gaits of a full-order anthropomorphic robot model with different reward imitation ratios, provided sim-to-sim control policy transfer, and compared the gaits in terms of robustness and energy efficiency. In addition, we conducted a qualitative analysis of the gaits by interviewing people, since our task was to create an appealing and natural gait for a humanoid robot. According to the results of the experiments, the most successful approach was the one in which the average value of rewards for imitation and adherence to command velocity per episode remained balanced throughout the training. The gait obtained with this method retains naturalness (median of 3.6 according to the user study) compared to the gait trained with imitation only (median of 4.0), while remaining robust close to the gait trained without reference trajectories.

arxiv情報

著者	Ekaterina Chaikovskaya,Inna Minashina,Vladimir Litvinenko,Egor Davydenko,Dmitry Makarov,Yulia Danik,Roman Gorbachev
発行日	2023-12-15 12:57:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Benchmarking the Full-Order Model Optimization Based Imitation in the Humanoid Robot Reinforcement Learning Walk

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー