Towards a Pipeline for Real-Time Visualization of Faces for VR-based Telepresence and Live Broadcasting Utilizing Neural Rendering

要約

バーチャルリアリティ（VR）用のヘッドマウントディスプレイ（HMD）は、コンシューマ市場で広く利用されるようになりましたが、HMDは参加者の顔の大部分を隠してしまうため、VRでリアルな対面会話を行うにはかなりの障害となります。HMDに直接取り付けられたカメラからの画像ストリームでも、極端な撮影角度や広視野による強いレンズの歪みにより、顔全体を説得力のある画像につなぎ合わせることは困難な課題となっています。VRの長い研究の中で、HMDの下に隠された顔の復元は、ごく最近の研究テーマです。現在の最先端ソリューションは、フォトリアリスティックな3D再構成結果を示す一方で、高コストの実験装置と大きな計算コストが必要です。本発表では、低コストなハードウェアに着目し、GPU1基を搭載した汎用的なゲーム機で利用可能なアプローチを紹介します。我々は、Generative Adversarial Networks (GAN)によって、エンドツーエンドパイプラインの利点を活用する。GANは、RGBDカメラで撮影された学習データセットに基づいて、正面から見た2.5次元点群データを生成します。本手法では、学習はオフラインで行い、再構成はリアルタイムで行う。その結果、「学習された」表現では十分な再構成品質が得られることがわかった。ネットワークによって学習されない表現はアーティファクトを生成し、不気味の谷効果を誘発する可能性がある。

要約(オリジナル)

While head-mounted displays (HMDs) for Virtual Reality (VR) have become widely available in the consumer market, they pose a considerable obstacle for a realistic face-to-face conversation in VR since HMDs hide a significant portion of the participants faces. Even with image streams from cameras directly attached to an HMD, stitching together a convincing image of an entire face remains a challenging task because of extreme capture angles and strong lens distortions due to a wide field of view. Compared to the long line of research in VR, reconstruction of faces hidden beneath an HMD is a very recent topic of research. While the current state-of-the-art solutions demonstrate photo-realistic 3D reconstruction results, they require high-cost laboratory equipment and large computational costs. We present an approach that focuses on low-cost hardware and can be used on a commodity gaming computer with a single GPU. We leverage the benefits of an end-to-end pipeline by means of Generative Adversarial Networks (GAN). Our GAN produces a frontal-facing 2.5D point cloud based on a training dataset captured with an RGBD camera. In our approach, the training process is offline, while the reconstruction runs in real-time. Our results show adequate reconstruction quality within the ‘learned’ expressions. Expressions not learned by the network produce artifacts and can trigger the Uncanny Valley effect.

arxiv情報

著者	Philipp Ladwig,Rene Ebertowski,Alexander Pech,Ralf Dörner,Christian Geiger
発行日	2023-01-04 08:49:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Towards a Pipeline for Real-Time Visualization of Faces for VR-based Telepresence and Live Broadcasting Utilizing Neural Rendering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー