Rethinking Diffusion for Text-Driven Human Motion Generation

要約

2023 年以降、ベクトル量子化 (VQ) ベースの離散生成手法が人間のモーション生成で急速に主流となり、主に標準パフォーマンス指標において拡散ベースの連続生成手法を上回りました。
ただし、VQ ベースの方法には固有の制限があります。
連続モーションデータを限られた離散トークンとして表現すると、避けられない情報損失が発生し、生成されるモーションの多様性が減少し、モーションプリアや生成ガイダンスとして効果的に機能する能力が制限されます。
対照的に、拡散ベースの手法の連続的な空間生成の性質により、これらの手法はこれらの制限に対処するのに適しており、モデルのスケーラビリティの可能性も備えています。
この研究では、現在の VQ ベースの手法が良好に機能する理由を体系的に調査し、モーションデータの表現と配布の観点から既存の拡散ベースの手法の限界を調査します。
これらの洞察に基づいて、私たちは拡散ベースの人間動作生成モデルの固有の強みを維持し、VQ ベースのアプローチからインスピレーションを得てそれを徐々に最適化します。
私たちのアプローチでは、双方向のマスクされた自己回帰を実行できる人間の動き拡散モデルを導入し、再構成されたデータ表現と分布で最適化されています。
さらに、異なるベースの方法を公平に評価するための、より堅牢な評価方法も提案します。
ベンチマークとなる人間の動作生成データセットに関する広範な実験により、私たちの方法が以前の方法よりも優れており、最先端のパフォーマンスを達成していることが実証されています。

要約(オリジナル)

Since 2023, Vector Quantization (VQ)-based discrete generation methods have rapidly dominated human motion generation, primarily surpassing diffusion-based continuous generation methods in standard performance metrics. However, VQ-based methods have inherent limitations. Representing continuous motion data as limited discrete tokens leads to inevitable information loss, reduces the diversity of generated motions, and restricts their ability to function effectively as motion priors or generation guidance. In contrast, the continuous space generation nature of diffusion-based methods makes them well-suited to address these limitations and with even potential for model scalability. In this work, we systematically investigate why current VQ-based methods perform well and explore the limitations of existing diffusion-based methods from the perspective of motion data representation and distribution. Drawing on these insights, we preserve the inherent strengths of a diffusion-based human motion generation model and gradually optimize it with inspiration from VQ-based approaches. Our approach introduces a human motion diffusion model enabled to perform bidirectional masked autoregression, optimized with a reformed data representation and distribution. Additionally, we also propose more robust evaluation methods to fairly assess different-based methods. Extensive experiments on benchmark human motion generation datasets demonstrate that our method excels previous methods and achieves state-of-the-art performances.

arxiv情報

著者	Zichong Meng,Yiming Xie,Xiaogang Peng,Zeyu Han,Huaizu Jiang
発行日	2024-11-25 16:59:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rethinking Diffusion for Text-Driven Human Motion Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー