ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

要約

特定のオーディオを使用してビデオをリップシンクすることは、仮想プレゼンターやパフォーマーの作成を含むさまざまなアプリケーションの基礎となります。
最近の研究では、さまざまなテクニックを使用して忠実度の高いリップシンクを研究していますが、そのタスク指向モデルでは、クリップ固有のトレーニングに長時間のビデオが必要か、目に見えるアーティファクトが保持されます。
この論文では、一般化された視聴覚顔情報を同期する、統合された効果的なフレームワーク ReSyncer を提案します。
重要な設計は、スタイルベースのジェネレーターを再検討して再配線し、原則に基づいたスタイル注入トランスフォーマーによって予測される 3D 顔のダイナミクスを効率的に採用することです。
ノイズとスタイルの空間内で情報挿入メカニズムを再構成するだけで、私たちのフレームワークは動きと外観を統合トレーニングと融合させます。
広範な実験により、ReSyncer は音声に従って高忠実度のリップシンクビデオを生成するだけでなく、高速のパーソナライズされた微調整、ビデオ主導のリップシンク、
話し方の転送、さらには顔の交換など。
リソースは https://guanjz20.github.io/projects/ReSyncer にあります。

要約(オリジナル)

Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyncer, that synchronizes generalized audio-visual facial information. The key design is revisiting and rewiring the Style-based generator to efficiently adopt 3D facial dynamics predicted by a principled style-injected Transformer. By simply re-configuring the information insertion mechanisms within the noise and style space, our framework fuses motion and appearance with unified training. Extensive experiments demonstrate that ReSyncer not only produces high-fidelity lip-synced videos according to audio, but also supports multiple appealing properties that are suitable for creating virtual presenters and performers, including fast personalized fine-tuning, video-driven lip-syncing, the transfer of speaking styles, and even face swapping. Resources can be found at https://guanjz20.github.io/projects/ReSyncer.

arxiv情報

著者	Jiazhi Guan,Zhiliang Xu,Hang Zhou,Kaisiyuan Wang,Shengyi He,Zhanwang Zhang,Borong Liang,Haocheng Feng,Errui Ding,Jingtuo Liu,Jingdong Wang,Youjian Zhao,Ziwei Liu
発行日	2024-08-06 16:31:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー