Controllable One-Shot Face Video Synthesis With Semantic Aware Prior

要約

タイトル：セマンティックアウェアプライオリティを使ったコントローラブルな一発顔動画合成

要約：

– コントローラブルな一発トーキングヘッド合成タスクは、ソース画像をドライビングフレームによって決まる別のポーズや表情にアニメーション化することを目的としている。
– 最近の手法は、疎なキーポイントから学習されたモーションフィールドを用いて、ソースから抽出された外観特徴を変形させることによって動画を生成する。そのため、軽量な構成のため、帯域幅が低くてもビデオ会議に適している。
– しかし、現在の手法には、次の2つの大きな制限がある。1）大きなヘッドポーズの場合には生成品質が不十分で、ドライビングビデオの最初のフレームとソースの間に観察可能なポーズの不整合が存在する。2）セマンティック理解と適切な顔形状の正則化の欠如により、細かいが重要な顔の動きの詳細を捉えることができない。
– これらの問題を解決するために、我々は、豊富な顔の事前情報を利用する新しい手法を提案する。提案されたモデルは改善されたセマンティック的一貫性（平均キーポイント距離でベースラインを7％改善）と感情表現（平均感情埋め込み距離でベースラインを15％上回る）を持つ顔動画を生成できる。さらに、この事前情報を組み込むことで、ポーズと表情の両方で高度にコントロール可能な生成を実現する便利なインターフェイスが提供される。

要約(オリジナル)

The one-shot talking-head synthesis task aims to animate a source image to another pose and expression, which is dictated by a driving frame. Recent methods rely on warping the appearance feature extracted from the source, by using motion fields estimated from the sparse keypoints, that are learned in an unsupervised manner. Due to their lightweight formulation, they are suitable for video conferencing with reduced bandwidth. However, based on our study, current methods suffer from two major limitations: 1) unsatisfactory generation quality in the case of large head poses and the existence of observable pose misalignment between the source and the first frame in driving videos. 2) fail to capture fine yet critical face motion details due to the lack of semantic understanding and appropriate face geometry regularization. To address these shortcomings, we propose a novel method that leverages the rich face prior information, the proposed model can generate face videos with improved semantic consistency (improve baseline by $7\%$ in average keypoint distance) and expression-preserving (outperform baseline by $15 \%$ in average emotion embedding distance) under equivalent bandwidth. Additionally, incorporating such prior information provides us with a convenient interface to achieve highly controllable generation in terms of both pose and expression.

arxiv情報

著者	Kangning Liu,Yu-Chuan Su,Wei,Hong,Ruijin Cang,Xuhui Jia
発行日	2023-04-27 19:17:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Controllable One-Shot Face Video Synthesis With Semantic Aware Prior

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー