LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field

要約

最近の研究では、パラメトリックモデル上の神経放射フィールド (NeRF) が SOTA 品質に達し、単眼ビデオからフォトリアリスティックな頭部アバターを構築できることが示されました。
ただし、NeRF ベースのアバターの大きな制限の 1 つは、NeRF の高密度のポイントサンプリングによるレンダリング速度の遅さであり、リソースに制約のあるデバイスでの広範な利用が妨げられています。
ニューラルライトフィールド (NeLF) に基づいた最初の頭部アバターモデルである LightAvatar を紹介します。
LightAvatar は、メッシュやボリュームレンダリングを使用せずに、単一のネットワークフォワードパスを介して 3DMM パラメーターとカメラポーズから画像をレンダリングします。
提案されたアプローチは、概念的には魅力的ですが、リアルタイムの効率とトレーニングの安定性に対して大きな課題をもたらします。
これらを解決するために、専用のネットワーク設計を導入して、NeLF モデルの適切な表現を取得し、低い FLOP バジェットを維持します。
一方、私たちは、事前トレーニングされたアバターモデルを教師として使用し、トレーニング用の豊富な疑似データを合成する蒸留ベースのトレーニング戦略を利用します。
ワーピングフィールドネットワークは、モデルがより適切に学習できるように実際のデータのフィッティングエラーを修正するために導入されています。
広範な実験により、私たちの方法は、量的または定性的に新しい SOTA 画質を達成できると同時に、対応する方法よりも大幅に高速であり、カスタマイズされた最適化を行わない消費者グレードの GPU (RTX3090) で 174.1 FPS (512×512 解像度) を報告できることが示唆されています。

要約(オリジナル)

Recent works have shown that neural radiance fields (NeRFs) on top of parametric models have reached SOTA quality to build photorealistic head avatars from a monocular video. However, one major limitation of the NeRF-based avatars is the slow rendering speed due to the dense point sampling of NeRF, preventing them from broader utility on resource-constrained devices. We introduce LightAvatar, the first head avatar model based on neural light fields (NeLFs). LightAvatar renders an image from 3DMM parameters and a camera pose via a single network forward pass, without using mesh or volume rendering. The proposed approach, while being conceptually appealing, poses a significant challenge towards real-time efficiency and training stability. To resolve them, we introduce dedicated network designs to obtain proper representations for the NeLF model and maintain a low FLOPs budget. Meanwhile, we tap into a distillation-based training strategy that uses a pretrained avatar model as teacher to synthesize abundant pseudo data for training. A warping field network is introduced to correct the fitting error in the real data so that the model can learn better. Extensive experiments suggest that our method can achieve new SOTA image quality quantitatively or qualitatively, while being significantly faster than the counterparts, reporting 174.1 FPS (512×512 resolution) on a consumer-grade GPU (RTX3090) with no customized optimization.

arxiv情報

著者	Huan Wang,Feitong Tan,Ziqian Bai,Yinda Zhang,Shichen Liu,Qiangeng Xu,Menglei Chai,Anish Prabhu,Rohit Pandey,Sean Fanello,Zeng Huang,Yun Fu
発行日	2024-09-26 17:00:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー