One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

要約

最近の研究では、StyleGAN の忠実度の高いポートレート生成の助けを借りて、ワンショットの顔ビデオ再現の低解像度の制約を徐々に克服していますが、これらのアプローチは、明示的な 2D/3D 事前確率、オプティカルフローの少なくとも 1 つに依存しています。
モーション記述子、市販のエンコーダーなどとしてのワーピングに基づいており、パフォーマンスを制限します (たとえば、一貫性のない予測、顔の細かいディテールやアクセサリーをキャプチャできない、一般化が不十分、アーティファクト)。
顔属性の編集、顔の動きと変形、およびビデオ生成のための顔のアイデンティティ制御を同時にサポートするためのエンドツーエンドのフレームワークを提案します。
これは、与えられたフレームを潜在のペアにエンコードするハイブリッド潜在空間を採用しています。
StyleGAN2 の $W+$ および $SS$ スペース。
これにより、$W+$ の印象的な編集可能性と歪みのトレードオフと、$SS$ の高度なもつれ解除特性が組み込まれています。
これらのハイブリッド潜在は、StyleGAN2 ジェネレーターを使用して、忠実度の高い顔ビデオの再現を $1024^2$ で実現します。
さらに、モデルは、他の潜在ベースのセマンティック編集 (ひげ、年齢、メイクアップなど) を使用して、現実的な再現ビデオの生成をサポートします。
最先端の方法に対して実行された定性的および定量的分析は、提案されたアプローチの優位性を示しています。

要約(オリジナル)

While recent research has progressively overcome the low-resolution constraint of one-shot face video re-enactment with the help of StyleGAN’s high-fidelity portrait generation, these approaches rely on at least one of the following: explicit 2D/3D priors, optical flow based warping as motion descriptors, off-the-shelf encoders, etc., which constrain their performance (e.g., inconsistent predictions, inability to capture fine facial details and accessories, poor generalization, artifacts). We propose an end-to-end framework for simultaneously supporting face attribute edits, facial motions and deformations, and facial identity control for video generation. It employs a hybrid latent-space that encodes a given frame into a pair of latents: Identity latent, $\mathcal{W}_{ID}$, and Facial deformation latent, $\mathcal{S}_F$, that respectively reside in the $W+$ and $SS$ spaces of StyleGAN2. Thereby, incorporating the impressive editability-distortion trade-off of $W+$ and the high disentanglement properties of $SS$. These hybrid latents employ the StyleGAN2 generator to achieve high-fidelity face video re-enactment at $1024^2$. Furthermore, the model supports the generation of realistic re-enactment videos with other latent-based semantic edits (e.g., beard, age, make-up, etc.). Qualitative and quantitative analyses performed against state-of-the-art methods demonstrate the superiority of the proposed approach.

arxiv情報

著者	Trevine Oorloff,Yaser Yacoob
発行日	2023-02-15 18:34:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー