Generating Human Interaction Motions in Scenes with Text Control

要約

我々は、ノイズ除去拡散モデルに基づいてテキスト制御されたシーン認識モーション生成のための手法である TeSMo を紹介します。
これまでのテキストからモーションへの変換手法は、モーション、テキストの説明、インタラクティブなシーンを含むデータセットの利用可能性が限られていたため、シーンを考慮せずに単独のキャラクターに焦点を当てていました。
私たちのアプローチは、シーンに依存しないテキストからモーションへの拡散モデルを事前トレーニングすることから始まり、大規模なモーションキャプチャデータセットに対する目標達成の制約を強調します。
次に、このモデルをシーン対応コンポーネントで強化し、地表面やオブジェクトの形状などの詳細なシーン情報で強化されたデータを使用して微調整します。
トレーニングを容易にするために、注釈付きのナビゲーションおよびインタラクションモーションをシーン内に埋め込みます。
提案された手法は、さまざまなオブジェクトの形状、方向、初期の体の位置、ポーズを持つさまざまなシーンで、ナビゲーションや座りなどの現実的で多様な人間とオブジェクトのインタラクションを生成します。
広範な実験により、人間とシーンのインタラクションの妥当性、および生成されるモーションのリアリズムと多様性の点で、私たちのアプローチが従来の技術を上回ることが実証されました。
コードは、この研究が https://research.nvidia.com/labs/toronto-ai/tesmo で公開されると公開されます。

要約(オリジナル)

We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model, emphasizing goal-reaching constraints on large-scale motion-capture datasets. We then enhance this model with a scene-aware component, fine-tuned using data augmented with detailed scene information, including ground plane and object shapes. To facilitate training, we embed annotated navigation and interaction motions within scenes. The proposed method produces realistic and diverse human-object interactions, such as navigation and sitting, in different scenes with various object shapes, orientations, initial body positions, and poses. Extensive experiments demonstrate that our approach surpasses prior techniques in terms of the plausibility of human-scene interactions, as well as the realism and variety of the generated motions. Code will be released upon publication of this work at https://research.nvidia.com/labs/toronto-ai/tesmo.

arxiv情報

著者	Hongwei Yi,Justus Thies,Michael J. Black,Xue Bin Peng,Davis Rempe
発行日	2024-04-16 16:04:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generating Human Interaction Motions in Scenes with Text Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー