RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

要約

ロボット基盤モデルの最近の進歩により、多様なタスクに適応できる汎用的なポリシーの開発が可能になりました。
これらのモデルは優れた柔軟性を示しますが、そのパフォーマンスはトレーニングデータの品質に大きく依存します。
この研究では、強化学習を利用してジェネラリストポリシーを微調整するための高品質のトレーニングデータを生成する方法である、強化学習蒸留ジェネラリスト (RLDG) を提案します。
コネクタの挿入や組み立てなどの正確な操作タスクに関する広範な現実世界の実験を通じて、RL で生成されたデータでトレーニングされたジェネラリストポリシーが、人間によるデモンストレーションでトレーニングされたポリシーよりも常に優れたパフォーマンスを示し、新しいタスクにより適切に汎用化しながら、最大 40% 高い成功率を達成できることを実証しました。
また、このパフォーマンスの向上が最適化されたアクション分散と改善された状態カバレッジの両方から生じていることを明らかにする詳細な分析も提供します。
私たちの結果は、タスク固有の RL と汎用的なポリシー蒸留を組み合わせることで、基礎モデルの柔軟性を維持しながら特殊なコントローラーのパフォーマンスを達成する、より有能で効率的なロボット操作システムを開発するための有望なアプローチを提供できることを示唆しています。
ビデオとコードはプロジェクト Web サイト https://generalist-distillation.github.io でご覧いただけます。

要約(オリジナル)

Recent advances in robotic foundation models have enabled the development of generalist policies that can adapt to diverse tasks. While these models show impressive flexibility, their performance heavily depends on the quality of their training data. In this work, we propose Reinforcement Learning Distilled Generalists (RLDG), a method that leverages reinforcement learning to generate high-quality training data for finetuning generalist policies. Through extensive real-world experiments on precise manipulation tasks like connector insertion and assembly, we demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations, achieving up to 40% higher success rates while generalizing better to new tasks. We also provide a detailed analysis that reveals this performance gain stems from both optimized action distributions and improved state coverage. Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems that maintain the flexibility of foundation models while achieving the performance of specialized controllers. Videos and code can be found on our project website https://generalist-distillation.github.io

arxiv情報

著者	Charles Xu,Qiyang Li,Jianlan Luo,Sergey Levine
発行日	2024-12-13 04:57:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー