SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets

要約

3D人間のデジタル化は、長い間、非常に追求されているが挑戦的な作業でした。
既存の方法は、単一または複数のビューから高品質の3Dデジタル人間を生成することを目的としていますが、主に現在のパラダイムと3Dヒト資産の希少性によって制約されたままです。
具体的には、最近のアプローチはいくつかのパラダイムに分類されます：最適化ベースとフィードフォワード（シングルビューの回帰と再構成によるマルチビュー生成の両方）。
ただし、それらは、閉塞と不可視のためにそれぞれ低次元平面を高次元空間にマッピングする際の低速、低品質、カスケードの推論、および曖昧さによって制限されています。
さらに、既存の3Dヒト資産は小規模であり、大規模なトレーニングには不十分です。
これらの課題に対処するために、3Dヒトのデジタル化のための潜在的な宇宙生成パラダイムを提案します。これは、DITベースの条件付き生成とともに、UV構造化されたVAEを介してマルチビュー画像をガウスに圧縮することを伴います。
さらに、合成データと組み合わせてマルチビュー最適化アプローチを採用して、大規模なトレーニングをサポートするために100万ドルの3Dガウス資産を含むHGS-1Mデータセットを構築します。
実験結果は、大規模なトレーニングを搭載したパラダイムが、複雑なテクスチャー、顔の詳細、ゆるい衣服の変形を備えた高品質の3Dヒトガウス派を生成することを示しています。

要約(オリジナル)

3D human digitization has long been a highly pursued yet challenging task. Existing methods aim to generate high-quality 3D digital humans from single or multiple views, but remain primarily constrained by current paradigms and the scarcity of 3D human assets. Specifically, recent approaches fall into several paradigms: optimization-based and feed-forward (both single-view regression and multi-view generation with reconstruction). However, they are limited by slow speed, low quality, cascade reasoning, and ambiguity in mapping low-dimensional planes to high-dimensional space due to occlusion and invisibility, respectively. Furthermore, existing 3D human assets remain small-scale, insufficient for large-scale training. To address these challenges, we propose a latent space generation paradigm for 3D human digitization, which involves compressing multi-view images into Gaussians via a UV-structured VAE, along with DiT-based conditional generation, we transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift, which also supports end-to-end inference. In addition, we employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset, which contains $1$ million 3D Gaussian assets to support the large-scale training. Experimental results demonstrate that our paradigm, powered by large-scale training, produces high-quality 3D human Gaussians with intricate textures, facial details, and loose clothing deformation.

arxiv情報

著者	Yuhang Yang,Fengqi Liu,Yixing Lu,Qin Zhao,Pingyu Wu,Wei Zhai,Ran Yi,Yang Cao,Lizhuang Ma,Zheng-Jun Zha,Junting Dong
発行日	2025-04-09 15:38:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー