MusicFace: Music-driven Expressive Singing Face Synthesis

要約

音楽信号によって駆動される鮮やかでリアルな歌顔を合成することは、依然として興味深く、挑戦的な問題です。
この論文では、唇、顔の表情、頭のポーズ、目の状態の自然な動きでこのタスクの方法を提示します。
人間の声と背景音楽の混合情報が音楽オーディオの共通信号に結合されるため、この課題に取り組むための分離と融合の戦略を設計します。
まず、入力音楽オーディオを人間の音声ストリームと BGM ストリームに分解します。
2 つのストリーム入力信号と顔の表情、頭の動き、目の状態のダイナミクスとの間の暗黙的で複雑な相関関係により、2 つのストリームの効果がシームレスに融合される注意スキームを使用してそれらの関係をモデル化します。
さらに、生成された結果の表現力を向上させるために、頭の動きの生成を速度の生成と方向の生成に分解し、目の状態の生成を短時間のまばたきの生成と長時間の閉眼の生成に分解して、それらを別々にモデル化することを提案します。
.
また、このタスクのトレーニングと評価をサポートし、このトピックに関する今後の作業を容易にするために、新しい SingingFace データセットを構築します。
広範な実験とユーザー調査により、提案された方法は、最先端の方法よりも質的および量的に優れた、鮮やかな歌顔を合成できることが示されています。

要約(オリジナル)

It is still an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music signal. In this paper, we present a method for this task with natural motions of the lip, facial expression, head pose, and eye states. Due to the coupling of the mixed information of human voice and background music in common signals of music audio, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into human voice stream and background music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressiveness of the generated results, we propose to decompose head movements generation into speed generation and direction generation, and decompose eye states generation into the short-time eye blinking generation and the long-time eye closing generation to model them separately. We also build a novel SingingFace Dataset to support the training and evaluation of this task, and to facilitate future works on this topic. Extensive experiments and user study show that our proposed method is capable of synthesizing vivid singing face, which is better than state-of-the-art methods qualitatively and quantitatively.

arxiv情報

著者	Pengfei Liu,Wenjin Deng,Hengda Li,Jintai Wang,Yinglin Zheng,Yiwei Ding,Xiaohu Guo,Ming Zeng
発行日	2023-03-24 14:51:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MusicFace: Music-driven Expressive Singing Face Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー