Expression-preserving face frontalization improves visually assisted speech processing

要約

顔の正面化とは、任意の角度から見た顔から、正面から見た顔を合成することである。本論文では、視覚支援音声通信の性能を向上させるために、顔の非剛体変形を保持する正面化手法を提案する。本手法は、任意視顔と顔モデルとの間で、(i)〜剛体変換（スケール、回転、並進）と(ii)〜非剛体変形の推定を交互に行うものである。本手法は、データ中の非ガウス性誤差を扱うことができること、動的な顔変形モデルを組み込んでいることの2点を重要な利点としている。そのために、我々は一般化スチューデントt分布と線形力学系を組み合わせ、頭部の剛体運動と発話によって生じる時間的に変化する顔の変形の両方を考慮することを目的としている。また，顔の表情を保持する能力を評価するために，ゼロ平均正規化相互相関（ZNCC）スコアを用いることを提案する．本手法を徹底的に評価し、従来の幾何学モデルに基づく、あるいはディープラーニングに基づく、いくつかの最新手法と比較する。さらに、本方法は、深層学習パイプライン、すなわち読唇術と音声強調に組み込まれた場合、単語認識と音声明瞭度のスコアをかなりのマージンで向上させることを示す。補足資料は、https://team.inria.fr/robotlearn/research/facefrontalization-benchmark/ でアクセスできます。

要約(オリジナル)

Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations in order to boost the performance of visually assisted speech communication. The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translation) and (ii)~the non-rigid deformation between an arbitrarily-viewed face and a face model. The method has two important merits: it can deal with non-Gaussian errors in the data and it incorporates a dynamical face deformation model. For that purpose, we use the generalized Student t-distribution in combination with a linear dynamic system in order to account for both rigid head motions and time-varying facial deformations caused by speech production. We propose to use the zero-mean normalized cross-correlation (ZNCC) score to evaluate the ability of the method to preserve facial expressions. The method is thoroughly evaluated and compared with several state of the art methods, either based on traditional geometric models or on deep learning. Moreover, we show that the method, when incorporated into deep learning pipelines, namely lip reading and speech enhancement, improves word recognition and speech intelligibilty scores by a considerable margin. Supplemental material is accessible at https://team.inria.fr/robotlearn/research/facefrontalization-benchmark/

arxiv情報

著者	Zhiqi Kang,Mostafa Sadeghi,Radu Horaud,Xavier Alameda-Pineda
発行日	2022-07-06 16:36:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Expression-preserving face frontalization improves visually assisted speech processing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー