Learn2Talk: 3D Talking Face Learns from 2D Talking Face

要約

音声駆動の顔のアニメーション方法には通常、3D と 2D の話し顔という 2 つの主要なクラスが含まれており、どちらも近年かなりの研究の注目を集めています。
しかし、私たちの知る限り、3D 会話顔に関する研究は、口唇同期 (リップシンク) や音声認識の観点において 2D 会話顔ほど深くは進んでいません。
2つのサブ分野間のギャップを考慮するために、我々はLearn2Talkという名前の学習フレームワークを提案します。これは、2D会話顔の分野からの2つの専門知識ポイントを活用することにより、より良い3D会話顔ネットワークを構築できます。
まず、オーディオとビデオの同期ネットワークにヒントを得て、オーディオと 3D 顔の動きの間のリップシンクを追求するために 3D シンクリップエキスパートモデルが考案されました。
次に、2D 会話顔メソッドから選択された教師モデルを使用して、オーディオから 3D モーションへの回帰ネットワークのトレーニングをガイドし、3D 頂点の精度を高めます。
広範な実験により、最先端技術と比較して、リップシンク、頂点精度、および音声知覚の点で、提案されたフレームワークの利点が示されています。
最後に、提案されたフレームワークの 2 つのアプリケーション、オーディオビジュアル音声認識と音声駆動の 3D ガウススプラッティングベースのアバターアニメーションを示します。

要約(オリジナル)

Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we propose a learning framework named Learn2Talk, which can construct a better 3D talking face network by exploiting two expertise points from the field of 2D talking face. Firstly, inspired by the audio-video sync network, a 3D sync-lip expert model is devised for the pursuit of lip-sync between audio and 3D facial motion. Secondly, a teacher model selected from 2D talking face methods is used to guide the training of the audio-to-3D motions regression network to yield more 3D vertex accuracy. Extensive experiments show the advantages of the proposed framework in terms of lip-sync, vertex accuracy and speech perception, compared with state-of-the-arts. Finally, we show two applications of the proposed framework: audio-visual speech recognition and speech-driven 3D Gaussian Splatting based avatar animation.

arxiv情報

著者	Yixiang Zhuang,Baoping Cheng,Yao Cheng,Yuntao Jin,Renshuai Liu,Chengyang Li,Xuan Cheng,Jing Liao,Juncong Lin
発行日	2024-04-19 13:45:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learn2Talk: 3D Talking Face Learns from 2D Talking Face

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー