Gaussian Shell Maps for Efficient 3D Human Generation

要約

3D デジタルヒューマンの効率的な生成は、仮想現実、ソーシャルメディア、映画制作などのいくつかの業界で重要です。
3D 敵対的生成ネットワーク (GAN) は、生成されたアセットの最先端 (SOTA) 品質と多様性を実証しました。
しかし、現在の 3D GAN アーキテクチャは通常、レンダリングが遅いボリューム表現に依存しているため、GAN トレーニングが妨げられ、マルチビューで一貫性のない 2D アップサンプラーが必要になります。
ここでは、関節可能なマルチシェルベースの足場を使用して、SOTA ジェネレーターネットワークアーキテクチャと新しい 3D ガウスレンダリングプリミティブを接続するフレームワークとして、ガウスシェルマップ (GSM) を紹介します。
この設定では、CNN はシェルにマッピングされた特徴を含む 3D テクスチャスタックを生成します。
後者は、標準的な身体ポーズのデジタルヒューマンのテンプレートサーフェスの膨張バージョンと収縮バージョンを表します。
シェルを直接ラスタライズする代わりに、テクスチャフィーチャに属性がエンコードされているシェルの 3D ガウスをサンプリングします。
これらのガウスは効率的かつ微分可能にレンダリングされます。
シェルを関節で表現する機能は、GAN トレーニング中、および推論時にボディを任意のユーザー定義のポーズに変形するために重要です。
当社の効率的なレンダリングスキームは、ビューに一貫性のないアップサンプラーの必要性を回避し、$512 \times 512$ ピクセルのネイティブ解像度で高品質のマルチビュー一貫したレンダリングを実現します。
SHHQ や DeepFashion などの単一ビューデータセットでトレーニングした場合、GSM が 3D 人間を正常に生成することを実証します。

要約(オリジナル)

Efficient generation of 3D digital humans is important in several industries, including virtual reality, social media, and cinematic production. 3D generative adversarial networks (GANs) have demonstrated state-of-the-art (SOTA) quality and diversity for generated assets. Current 3D GAN architectures, however, typically rely on volume representations, which are slow to render, thereby hampering the GAN training and requiring multi-view-inconsistent 2D upsamplers. Here, we introduce Gaussian Shell Maps (GSMs) as a framework that connects SOTA generator network architectures with emerging 3D Gaussian rendering primitives using an articulable multi shell–based scaffold. In this setting, a CNN generates a 3D texture stack with features that are mapped to the shells. The latter represent inflated and deflated versions of a template surface of a digital human in a canonical body pose. Instead of rasterizing the shells directly, we sample 3D Gaussians on the shells whose attributes are encoded in the texture features. These Gaussians are efficiently and differentiably rendered. The ability to articulate the shells is important during GAN training and, at inference time, to deform a body into arbitrary user-defined poses. Our efficient rendering scheme bypasses the need for view-inconsistent upsamplers and achieves high-quality multi-view consistent renderings at a native resolution of $512 \times 512$ pixels. We demonstrate that GSMs successfully generate 3D humans when trained on single-view datasets, including SHHQ and DeepFashion.

arxiv情報

著者	Rameen Abdal,Wang Yifan,Zifan Shi,Yinghao Xu,Ryan Po,Zhengfei Kuang,Qifeng Chen,Dit-Yan Yeung,Gordon Wetzstein
発行日	2023-11-29 18:04:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Gaussian Shell Maps for Efficient 3D Human Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー