Plan, Posture and Go: Towards Open-World Text-to-Motion Generation

要約

従来のテキストとモーションの生成方法は通常、限られたテキストとモーションのペアでトレーニングされるため、オープンワールドのシナリオに一般化することが困難です。
一部の作品では、CLIP モデルを使用してモーション空間とテキスト空間を位置合わせし、自然言語のモーション記述からモーションを生成できるようにすることを目指しています。
ただし、限定的で非現実的なその場での動作を生成するように依然として制約されています。
これらの問題に対処するために、PRO-Motion という名前の分割統治フレームワークを提案します。これは、モーションプランナー、姿勢ディフューザー、ゴーディフューザーの 3 つのモジュールで構成されます。
モーションプランナーは、ターゲットモーションの主要な姿勢を記述する一連のスクリプトを生成するようにラージ言語モデル (LLM) に指示します。
自然言語とは異なり、スクリプトは非常に単純なテキストテンプレートに従って、考えられるすべての姿勢を記述することができます。
これにより、スクリプトを姿勢に変換する姿勢ディフューザーの複雑さが大幅に軽減され、オープンワールド生成への道が開かれます。
最後に、別の拡散モデルとして実装された go-diffuser は、すべての姿勢について全身の平行移動と回転を推定し、現実的な動きを実現します。
実験結果は、他の対応物に対する私たちの方法の優位性を示し、「深い喜びの感覚を経験する」などの複雑なオープンワールドのプロンプトから多様で現実的な動きを生成する能力を実証しました。
プロジェクトページは https://moonsliu.github.io/Pro-Motion から入手できます。

要約(オリジナル)

Conventional text-to-motion generation methods are usually trained on limited text-motion pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP model to align the motion space and the text space, aiming to enable motion generation from natural language motion descriptions. However, they are still constrained to generate limited and unrealistic in-place motions. To address these issues, we present a divide-and-conquer framework named PRO-Motion, which consists of three modules as motion planner, posture-diffuser and go-diffuser. The motion planner instructs Large Language Models (LLMs) to generate a sequence of scripts describing the key postures in the target motion. Differing from natural languages, the scripts can describe all possible postures following very simple text templates. This significantly reduces the complexity of posture-diffuser, which transforms a script to a posture, paving the way for open-world generation. Finally, go-diffuser, implemented as another diffusion model, estimates whole-body translations and rotations for all postures, resulting in realistic motions. Experimental results have shown the superiority of our method with other counterparts, and demonstrated its capability of generating diverse and realistic motions from complex open-world prompts such as ‘Experiencing a profound sense of joy’. The project page is available at https://moonsliu.github.io/Pro-Motion.

arxiv情報

著者	Jinpeng Liu,Wenxun Dai,Chunyu Wang,Yiji Cheng,Yansong Tang,Xin Tong
発行日	2023-12-22 17:02:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Plan, Posture and Go: Towards Open-World Text-to-Motion Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー