Quantized GAN for Complex Music Generation from Dance Videos

要約

Dance2Music-GAN（D2M-GAN）は、ダンスビデオを条件とした複雑な音楽サンプルを生成する新しい敵対的マルチモーダルフレームワークです。
私たちが提案するフレームワークは、ダンスビデオフレームと人体の動きを入力として受け取り、対応する入力にもっともらしく付随する音楽サンプルを生成することを学習します。
シンボリックオーディオ表現（MIDIなど）を使用して特定のタイプのモノインストゥルメンタルサウンドを生成し、通常は事前定義された音楽シンセサイザーに依存するほとんどの既存の条件付き音楽生成作品とは異なり、この作品では複雑なスタイルのダンスミュージックを生成します（例：
Vector Quantized（VQ）オーディオ表現を採用することにより、ポップ、ブレイキングなど）、その一般性と、シンセサイザーおよび連続的な対応物の高い抽象化能力の両方を活用します。
複数のデータセットに対して広範な一連の実験を実行し、包括的な評価プロトコルに従って、代替案に対する提案の生成品質を評価します。
音楽の一貫性、ビートの対応、音楽の多様性を測定する、得られた定量的な結果は、提案された方法の有効性を示しています。
最後になりましたが、私たちは、実際のアプリケーションでのアプローチの有効性をさらに実証するために使用する、野生のTikTokビデオの挑戦的なダンスミュージックデータセットをキュレートします。これは、出発点として役立つことを望んでいます。
関連する将来の研究のために。

要約(オリジナル)

We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates complex musical samples conditioned on dance videos. Our proposed framework takes dance video frames and human body motions as input, and learns to generate music samples that plausibly accompany the corresponding input. Unlike most existing conditional music generation works that generate specific types of mono-instrumental sounds using symbolic audio representations (e.g., MIDI), and that usually rely on pre-defined musical synthesizers, in this work we generate dance music in complex styles (e.g., pop, breaking, etc.) by employing a Vector Quantized (VQ) audio representation, and leverage both its generality and high abstraction capacity of its symbolic and continuous counterparts. By performing an extensive set of experiments on multiple datasets, and following a comprehensive evaluation protocol, we assess the generative qualities of our proposal against alternatives. The attained quantitative results, which measure the music consistency, beats correspondence, and music diversity, demonstrate the effectiveness of our proposed method. Last but not least, we curate a challenging dance-music dataset of in-the-wild TikTok videos, which we use to further demonstrate the efficacy of our approach in real-world applications — and which we hope to serve as a starting point for relevant future research.

arxiv情報

著者	Ye Zhu,Kyle Olszewski,Yu Wu,Panos Achlioptas,Menglei Chai,Yan Yan,Sergey Tulyakov
発行日	2022-07-19 17:17:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Quantized GAN for Complex Music Generation from Dance Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー