Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality

要約

既存の四足動物の移動学習パラダイムは通常、SIM2realギャップを緩和し、堅牢性を高めるために、広範なドメインのランダム化に依存しています。
不確実性の下で確実に実行するために、幅広い環境パラメーターとセンサーノイズでポリシーを訓練します。
ただし、理想的な条件下で最適なパフォーマンスは、最悪のシナリオを処理する必要性と矛盾することが多いため、最適性と堅牢性の間にトレードオフがあります。
このトレードオフは、理想的なパフォーマンスを犠牲にする過度に保守的な行動につながる、理想的なものの効率性と精度よりも多様で挑戦的な条件の安定性を優先することを学んだ政策を強制します。
この論文では、ポリシー学習を想像上の移行と統合することにより、このトレードオフを軽減する2段階のフレームワークを提案します。
このフレームワークは、想像上の遷移を実証的な入力として組み込むことにより、従来の強化学習（RL）アプローチを強化します。
これらの想像された移行は、理想的な設定内で動作する最適なポリシーとダイナミクスモデルから派生しています。
我々の調査結果は、このアプローチが既存のRLアルゴリズムのドメインランダム化誘発性の悪影響を大幅に軽減することを示しています。
トレーニングの加速、分布内の追跡エラーの減少、および分布外の堅牢性の向上につながります。

要約(オリジナル)

Existing quadrupedal locomotion learning paradigms usually rely on extensive domain randomization to alleviate the sim2real gap and enhance robustness. It trains policies with a wide range of environment parameters and sensor noises to perform reliably under uncertainty. However, since optimal performance under ideal conditions often conflicts with the need to handle worst-case scenarios, there is a trade-off between optimality and robustness. This trade-off forces the learned policy to prioritize stability in diverse and challenging conditions over efficiency and accuracy in ideal ones, leading to overly conservative behaviors that sacrifice peak performance. In this paper, we propose a two-stage framework that mitigates this trade-off by integrating policy learning with imagined transitions. This framework enhances the conventional reinforcement learning (RL) approach by incorporating imagined transitions as demonstrative inputs. These imagined transitions are derived from an optimal policy and a dynamics model operating within an idealized setting. Our findings indicate that this approach significantly mitigates the domain randomization-induced negative impact of existing RL algorithms. It leads to accelerated training, reduced tracking errors within the distribution, and enhanced robustness outside the distribution.

arxiv情報

著者	Wei Xiao,Shangke Lyu,Zhefei Gong,Renjie Wang,Donglin Wang
発行日	2025-03-13 15:52:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Robotic Policy with Imagined Transition: Mitigating the Trade-off between Robustness and Optimality

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー