Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation

要約

テキスト駆動のヒューマンモーション合成は、抽象的なテキストの合図から複雑な動きを簡単に生成できる能力で大きな注目を集めており、映画の物語だけでなく、仮想現実体験やコンピューターゲーム開発においてもモーションデザインに革命をもたらす可能性を示しています。
既存の方法は 3D モーションキャプチャデータに依存することが多く、特別なセットアップが必要となるため、データ取得コストが高くなり、最終的に人間の動きの多様性と範囲が制限されます。
対照的に、2D ヒューマンビデオは、幅広いスタイルやアクティビティをカバーする、膨大でアクセス可能なモーションデータのソースを提供します。
このペーパーでは、テキスト駆動の 3D モーション生成を改善するための代替データソースとして、ビデオから抽出された 2D 人間のモーションを活用することを検討します。
私たちのアプローチは、ローカルな関節の動きをグローバルな動きから解きほぐす新しいフレームワークを導入し、2D データからローカルな動きの事前分布を効率的に学習できるようにします。
まず、テキストとモーションのペアの大規模なデータセットでシングルビュー 2D ローカルモーションジェネレーターをトレーニングします。
このモデルを強化して 3D モーションを合成するには、3D データを使用してジェネレーターを微調整し、ビュー一貫性のあるローカル関節のモーションとルートダイナミクスを予測するマルチビュージェネレーターに変換します。
HumanML3D データセットと新しいテキストプロンプトの実験は、私たちの方法が 2D データを効率的に利用し、リアルな 3D 人間のモーション生成をサポートし、サポートするモーションタイプの範囲を拡大していることを示しています。
私たちのコードは https://zju3dv.github.io/Motion-2-to-3/ で公開されます。

要約(オリジナル)

Text-driven human motion synthesis is capturing significant attention for its ability to effortlessly generate intricate movements from abstract text cues, showcasing its potential for revolutionizing motion design not only in film narratives but also in virtual reality experiences and computer game development. Existing methods often rely on 3D motion capture data, which require special setups resulting in higher costs for data acquisition, ultimately limiting the diversity and scope of human motion. In contrast, 2D human videos offer a vast and accessible source of motion data, covering a wider range of styles and activities. In this paper, we explore leveraging 2D human motion extracted from videos as an alternative data source to improve text-driven 3D motion generation. Our approach introduces a novel framework that disentangles local joint motion from global movements, enabling efficient learning of local motion priors from 2D data. We first train a single-view 2D local motion generator on a large dataset of text-motion pairs. To enhance this model to synthesize 3D motion, we fine-tune the generator with 3D data, transforming it into a multi-view generator that predicts view-consistent local joint motion and root dynamics. Experiments on the HumanML3D dataset and novel text prompts demonstrate that our method efficiently utilizes 2D data, supporting realistic 3D human motion generation and broadening the range of motion types it supports. Our code will be made publicly available at https://zju3dv.github.io/Motion-2-to-3/.

arxiv情報

著者	Huaijin Pi,Ruoxi Guo,Zehong Shen,Qing Shuai,Zechen Hu,Zhumei Wang,Yajiao Dong,Ruizhen Hu,Taku Komura,Sida Peng,Xiaowei Zhou
発行日	2024-12-17 17:34:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー