Parkour in the Wild: Learning a General and Extensible Agile Locomotion Policy Using Multi-expert Distillation and RL Fine-tuning

要約

脚のあるロボットは、車輪付きロボットにアクセスできない地形をナビゲートするのに適しているため、捜索救助や宇宙探査の用途に最適です。
ただし、現在の制御方法は、多様で構造化されていない環境全体で一般化するのに苦労しています。
このペーパーでは、マルチ専門家の蒸留と強化学習（RL）の微調整を組み合わせて堅牢な一般化を実現することにより、脚のロボットのアジャイルな移動のための新しいフレームワークを紹介します。
当初、地形固有の専門家ポリシーは、専門の移動スキルを開発するために訓練されています。
これらのポリシーは、Daggerアルゴリズムを介して統一された基礎ポリシーに蒸留されます。
その後、蒸留ポリシーは、実際の3Dスキャンを含む、より広い地形セットでRLを使用して微調整されます。
このフレームワークにより、微調整を繰り返して新しい地形へのさらなる適応が可能になります。
提案されたポリシーは、深度画像を外部受容入力として活用し、多様で非構造化された地形間の堅牢なナビゲーションを可能にします。
実験結果は、マルチテレインスキルを単一のコントローラーに合成する際の既存の方法よりも大幅なパフォーマンスの改善を示しています。
Anymal Dロボットの展開は、俊敏性と堅牢性を備えた複雑な環境をナビゲートするポリシーの能力を検証し、脚のロボットの移動の新しいベンチマークを設定します。

要約(オリジナル)

Legged robots are well-suited for navigating terrains inaccessible to wheeled robots, making them ideal for applications in search and rescue or space exploration. However, current control methods often struggle to generalize across diverse, unstructured environments. This paper introduces a novel framework for agile locomotion of legged robots by combining multi-expert distillation with reinforcement learning (RL) fine-tuning to achieve robust generalization. Initially, terrain-specific expert policies are trained to develop specialized locomotion skills. These policies are then distilled into a unified foundation policy via the DAgger algorithm. The distilled policy is subsequently fine-tuned using RL on a broader terrain set, including real-world 3D scans. The framework allows further adaptation to new terrains through repeated fine-tuning. The proposed policy leverages depth images as exteroceptive inputs, enabling robust navigation across diverse, unstructured terrains. Experimental results demonstrate significant performance improvements over existing methods in synthesizing multi-terrain skills into a single controller. Deployment on the ANYmal D robot validates the policy’s ability to navigate complex environments with agility and robustness, setting a new benchmark for legged robot locomotion.

arxiv情報

著者	Nikita Rudin,Junzhe He,Joshua Aurand,Marco Hutter
発行日	2025-05-16 12:07:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Parkour in the Wild: Learning a General and Extensible Agile Locomotion Policy Using Multi-expert Distillation and RL Fine-tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー