Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration

要約

このペーパーでは、現在のヒューマノイドロボット制御フレームワークの制限に対処します。これは、主に反応的なメカニズムに依存しており、データ不足による自律的な相互作用機能を欠いています。
言語の理解、エゴセントリックシーンの知覚、およびモーションコントロールを統合する新しいフレームワークであるヒューマノイドVLAを提案し、普遍的なヒューマノイド制御を可能にします。
ヒューマノイドVLAは、テキストの説明と組み合わせた非エコセントリックヒトモーションデータセットを使用して、言語モーションの事前調整から始まり、モデルが普遍的なモーションパターンとアクションセマンティクスを学習できるようにします。
次に、エゴセントリックの視覚コンテキストを、パラメーター効率的なビデオコンディショニングされた微調整、コンテキスト認識モーション生成を可能にします。
さらに、モーションデータから直接導出された擬似解決を自動的に生成する自己監視データ増強戦略を導入します。
このプロセスは、生のモーションシーケンスを有益な質問回答ペアに変換し、大規模な非標識ビデオデータの効果的な使用を促進します。
全身制御アーキテクチャの上に構築された広範な実験は、ヒューマノイドVLAがコンテキスト認識を高め、オブジェクトの相互作用と環境探査タスクを達成し、適応的でインテリジェントなエンゲージメントのためのより人間のような能力を実証することを示しています。

要約(オリジナル)

This paper addresses the limitations of current humanoid robot control frameworks, which primarily rely on reactive mechanisms and lack autonomous interaction capabilities due to data scarcity. We propose Humanoid-VLA, a novel framework that integrates language understanding, egocentric scene perception, and motion control, enabling universal humanoid control. Humanoid-VLA begins with language-motion pre-alignment using non-egocentric human motion datasets paired with textual descriptions, allowing the model to learn universal motion patterns and action semantics. We then incorporate egocentric visual context through a parameter efficient video-conditioned fine-tuning, enabling context-aware motion generation. Furthermore, we introduce a self-supervised data augmentation strategy that automatically generates pseudoannotations directly derived from motion data. This process converts raw motion sequences into informative question-answer pairs, facilitating the effective use of large-scale unlabeled video data. Built upon whole-body control architectures, extensive experiments show that Humanoid-VLA achieves object interaction and environment exploration tasks with enhanced contextual awareness, demonstrating a more human-like capacity for adaptive and intelligent engagement.

arxiv情報

著者	Pengxiang Ding,Jianfei Ma,Xinyang Tong,Binghong Zou,Xinxin Luo,Yiguo Fan,Ting Wang,Hongchao Lu,Panzhong Mo,Jinxin Liu,Yuefan Wang,Huaicheng Zhou,Wenshuo Feng,Jiacheng Liu,Siteng Huang,Donglin Wang
発行日	2025-02-21 08:09:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー