Learning from Massive Human Videos for Universal Humanoid Pose Control

要約

ヒューマノイドロボットのスケーラブルな学習は、現実世界のアプリケーションに展開するために不可欠です。
従来のアプローチは主に強化学習または遠隔操作に依存して全身制御を実現していますが、シミュレートされた環境の多様性とデモンストレーション収集のコストの高さによって制限されることがよくあります。
対照的に、人間のビデオは遍在しており、人型ロボットの汎化能力を大幅に向上させる可能性のある意味論的および動作情報の未開発のソースを提供します。
この論文では、Humanoid-X を紹介します。Humanoid-X は、2,000 万を超えるヒューマノイドロボットのポーズと、対応するテキストベースの動作説明を含む大規模なデータセットであり、この豊富なデータを活用するように設計されています。
Humanoid-X は、インターネットからのデータマイニング、ビデオキャプションの生成、人間からヒューマノイドロボットへのモーションリターゲティング、実世界展開のためのポリシー学習など、包括的なパイプラインを通じてキュレーションされています。
Humanoid-X では、テキスト命令を入力として受け取り、対応するアクションを出力してヒューマノイドロボットを制御する大型ヒューマノイドモデル UH-1 をさらにトレーニングします。
広範なシミュレーションおよび現実世界の実験により、当社のスケーラブルなトレーニングアプローチがテキストベースのヒューマノイド制御における優れた一般化につながり、適応性のある現実世界対応ヒューマノイドロボットに向けた重要な一歩となることが検証されました。

要約(オリジナル)

Scalable learning of humanoid robots is crucial for their deployment in real-world applications. While traditional approaches primarily rely on reinforcement learning or teleoperation to achieve whole-body control, they are often limited by the diversity of simulated environments and the high costs of demonstration collection. In contrast, human videos are ubiquitous and present an untapped source of semantic and motion information that could significantly enhance the generalization capabilities of humanoid robots. This paper introduces Humanoid-X, a large-scale dataset of over 20 million humanoid robot poses with corresponding text-based motion descriptions, designed to leverage this abundant data. Humanoid-X is curated through a comprehensive pipeline: data mining from the Internet, video caption generation, motion retargeting of humans to humanoid robots, and policy learning for real-world deployment. With Humanoid-X, we further train a large humanoid model, UH-1, which takes text instructions as input and outputs corresponding actions to control a humanoid robot. Extensive simulated and real-world experiments validate that our scalable training approach leads to superior generalization in text-based humanoid control, marking a significant step toward adaptable, real-world-ready humanoid robots.

arxiv情報

著者	Jiageng Mao,Siheng Zhao,Siqi Song,Tianheng Shi,Junjie Ye,Mingtong Zhang,Haoran Geng,Jitendra Malik,Vitor Guizilini,Yue Wang
発行日	2024-12-18 18:59:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning from Massive Human Videos for Universal Humanoid Pose Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー