OPT: One-shot Pose-Controllable Talking Head Generation

要約

ワンショットトーキングヘッド生成は、任意のオーディオと 1 つのソースの顔に基づいてリップシンクのトーキングヘッドを生成します。
自然さとリアルさを保証するために、最近の方法では、単に口の領域を編集するのではなく、自由なポーズコントロールを実現することが提案されています。
ただし、既存の方法では、頭の動きを生成するときにソースの顔の正確なアイデンティティが保持されません。
アイデンティティの不一致の問題を解決し、高品質の自由ポーズ制御を実現するために、ワンショットポーズ制御可能なトーキングヘッド生成ネットワーク (OPT) を提示します。
具体的には、Audio Feature Disentanglement Module はコンテンツの特徴をオーディオから分離し、任意の駆動オーディオに含まれるスピーカー固有の情報の影響を排除します。
その後、コンテンツの特徴とソースの顔から口の表情の特徴が抽出されます。その間に、ランドマークの損失が顔の構造の精度とアイデンティティ保存品質を向上させるように設計されています。
最後に、自由なポーズ制御を実現するために、参照ビデオからの制御可能な頭のポーズ機能が、表現機能とソースの顔とともにビデオジェネレーターに送られ、新しいトーキングヘッドが生成されます。
広範な定量的および定性的な実験結果により、OPT が、以前の SOTA メソッドよりも優れた、アイデンティティの不一致の問題がなく、高品質のポーズ制御可能なトーキングヘッドを生成することが確認されています。

要約(オリジナル)

One-shot talking head generation produces lip-sync talking heads based on arbitrary audio and one source face. To guarantee the naturalness and realness, recent methods propose to achieve free pose control instead of simply editing mouth areas. However, existing methods do not preserve accurate identity of source face when generating head motions. To solve the identity mismatch problem and achieve high-quality free pose control, we present One-shot Pose-controllable Talking head generation network (OPT). Specifically, the Audio Feature Disentanglement Module separates content features from audios, eliminating the influence of speaker-specific information contained in arbitrary driving audios. Later, the mouth expression feature is extracted from the content feature and source face, during which the landmark loss is designed to enhance the accuracy of facial structure and identity preserving quality. Finally, to achieve free pose control, controllable head pose features from reference videos are fed into the Video Generator along with the expression feature and source face to generate new talking heads. Extensive quantitative and qualitative experimental results verify that OPT generates high-quality pose-controllable talking heads with no identity mismatch problem, outperforming previous SOTA methods.

arxiv情報

著者	Jin Liu,Xi Wang,Xiaomeng Fu,Yesheng Chai,Cai Yu,Jiao Dai,Jizhong Han
発行日	2023-02-16 10:26:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OPT: One-shot Pose-Controllable Talking Head Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー