Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

要約

音楽からダンスを生成するというタスクは重要ですが、主に関節シーケンスを生成する現在の方法では、正確な関節の注釈が必要なため、出力が直観性に欠け、データ収集が複雑になります。
条件付き画像からビデオへの生成原理を利用して、静止画像からダンスビデオを直接作成するための条件付き入力として音楽を使用する、Dance Any Beat 拡散モデル、つまり DabFusion を導入します。
このアプローチは、画像からビデオへの合成における調整要素として音楽を使用する先駆者です。
私たちの手法は 2 つの段階で展開します。1 つは参照フレームと駆動フレームの間の潜在オプティカルフローを予測するオートエンコーダーのトレーニング、ジョイントアノテーションの必要性の排除、そして音楽リズムに導かれたこれらの潜在オプティカルフローを生成する U-Net ベースの拡散モデルのトレーニングです。
CLAPによってエンコードされています。
高品質のダンスビデオを作成できますが、ベースラインモデルはリズム調整に苦労しています。
ビート情報を追加してモデルを強化し、同期を改善します。
定量的な評価のために、2Dモーションミュージックアライメントスコア（2D-MM Align）を導入します。
AIST++ データセットで評価された私たちの強化されたモデルは、2D-MM Align スコアと確立された指標において顕著な改善を示しています。
ビデオ結果はプロジェクトページ https://DabFusion.github.io でご覧いただけます。

要約(オリジナル)

The task of generating dance from music is crucial, yet current methods, which mainly produce joint sequences, lead to outputs that lack intuitiveness and complicate data collection due to the necessity for precise joint annotations. We introduce a Dance Any Beat Diffusion model, namely DabFusion, that employs music as a conditional input to directly create dance videos from still images, utilizing conditional image-to-video generation principles. This approach pioneers the use of music as a conditioning factor in image-to-video synthesis. Our method unfolds in two stages: training an auto-encoder to predict latent optical flow between reference and driving frames, eliminating the need for joint annotation, and training a U-Net-based diffusion model to produce these latent optical flows guided by music rhythm encoded by CLAP. Although capable of producing high-quality dance videos, the baseline model struggles with rhythm alignment. We enhance the model by adding beat information, improving synchronization. We introduce a 2D motion-music alignment score (2D-MM Align) for quantitative assessment. Evaluated on the AIST++ dataset, our enhanced model shows marked improvements in 2D-MM Align score and established metrics. Video results can be found on our project page: https://DabFusion.github.io.

arxiv情報

著者	Xuanchen Wang,Heng Wang,Dongnan Liu,Weidong Cai
発行日	2024-05-15 11:33:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー