Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control

要約

拡散モデルは、大規模なデータセットから複雑な分布をキャプチャする際に優れた性能を発揮し、四足歩行ロコモーション制御に有望なソリューションを提供する。しかし、拡散プランナーのロバスト性は、事前に収集したデータセットの多様性に本質的に依存する。この問題を緩和するために、限られたデータセット（報酬にとらわれない）の下で、拡散プランナの能力を向上させる2段階の学習フレームワークを提案する。オフライン段階を通して、拡散プランナは、報酬ラベルを用いずに、エキスパートデータセットから状態-行動シーケンスの結合分布を学習する。その後、学習されたオフラインプランナに基づいて、シミュレーション環境においてオンライン相互作用を実行することで、元の振る舞いを大幅に多様化し、ロバスト性を向上させる。具体的には、真実の報酬や人間の嗜好を用いない、新しい弱い嗜好ラベリング法を提案する。提案手法は、異なる速度下でのペーシング歩行、トロット歩行、バウンディング歩行において優れた安定性と速度追跡精度を示し、実際のUnitree Go1ロボットへのゼロショット転送が可能である。本論文のプロジェクトサイトはhttps://shangjaven.github.io/preference-aligned-diffusion-legged。

要約(オリジナル)

Diffusion models demonstrate superior performance in capturing complex distributions from large-scale datasets, providing a promising solution for quadrupedal locomotion control. However, the robustness of the diffusion planner is inherently dependent on the diversity of the pre-collected datasets. To mitigate this issue, we propose a two-stage learning framework to enhance the capability of the diffusion planner under limited dataset (reward-agnostic). Through the offline stage, the diffusion planner learns the joint distribution of state-action sequences from expert datasets without using reward labels. Subsequently, we perform the online interaction in the simulation environment based on the trained offline planner, which significantly diversified the original behavior and thus improves the robustness. Specifically, we propose a novel weak preference labeling method without the ground-truth reward or human preferences. The proposed method exhibits superior stability and velocity tracking accuracy in pacing, trotting, and bounding gait under different speeds and can perform a zero-shot transfer to the real Unitree Go1 robots. The project website for this paper is at https://shangjaven.github.io/preference-aligned-diffusion-legged.

arxiv情報

著者	Xinyi Yuan,Zhiwei Shang,Zifan Wang,Chenkai Wang,Zhao Shan,Meixin Zhu,Chenjia Bai,Xuelong Li,Weiwei Wan,Kensuke Harada
発行日	2025-03-03 14:24:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー