HumanTOMATO: Text-aligned Whole-body Motion Generation

要約

この研究は、新しいテキスト駆動の全身動作生成タスクを対象としています。このタスクは、与えられたテキスト記述を入力として受け取り、高品質で多様かつ一貫した顔の表情、手のジェスチャー、および体の動作を同時に生成することを目的としています。
テキスト駆動のモーション生成タスクに関するこれまでの研究には、主に 2 つの制限がありました。1 つは、鮮やかな全身モーション生成におけるきめ細かい手と顔の制御という重要な役割を無視していること、もう 1 つはテキストとモーションの間の適切な調整に欠けていることです。
このような制限に対処するために、私たちは HumanTOMATO という名前のテキスト整列全身運動生成フレームワークを提案します。これは、この研究分野で適用可能な全体的運動生成に向けた私たちの知識による最初の試みです。
この困難な課題に取り組むために、当社のソリューションには 2 つの重要な設計が含まれています。(1) ホリスティック階層型 VQ-VAE (別名 H$^2$VQ) と、身体と手の動きをきめ細かく再構築し、2 つの構造化された階層型 GPT を生成します。
コードブック。
(2) 生成されたモーションが入力テキストの説明と明示的に一致するように支援する、事前トレーニングされたテキストモーションアライメントモデル。
包括的な実験により、生成されたモーションの品質とテキストとの位置合わせの両方において、私たちのモデルが大きな利点を持っていることが確認されました。

要約(オリジナル)

This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously. Previous works on text-driven motion generation tasks mainly have two limitations: they ignore the key role of fine-grained hand and face controlling in vivid whole-body motion generation, and lack a good alignment between text and motion. To address such limitations, we propose a Text-aligned whOle-body Motion generATiOn framework, named HumanTOMATO, which is the first attempt to our knowledge towards applicable holistic motion generation in this research area. To tackle this challenging task, our solution includes two key designs: (1) a Holistic Hierarchical VQ-VAE (aka H$^2$VQ) and a Hierarchical-GPT for fine-grained body and hand motion reconstruction and generation with two structured codebooks; and (2) a pre-trained text-motion-alignment model to help generated motion align with the input textual description explicitly. Comprehensive experiments verify that our model has significant advantages in both the quality of generated motions and their alignment with text.

arxiv情報

著者	Shunlin Lu,Ling-Hao Chen,Ailing Zeng,Jing Lin,Ruimao Zhang,Lei Zhang,Heung-Yeung Shum
発行日	2023-10-19 17:59:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HumanTOMATO: Text-aligned Whole-body Motion Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー