LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

要約

単一の画像からのアニメーション可能な3Dヒト再構築は、ジオメトリ、外観、および変形の分離の曖昧さのため、挑戦的な問題です。
3D人間の再構築における最近の進歩は、主に静的人間のモデリングに焦点を当てており、トレーニングに合成3Dスキャンを使用することは一般化能力を制限します。
逆に、最適化ベースのビデオ方法はより高い忠実度を達成しますが、需要制御されたキャプチャ条件と計算集中的な改良プロセスを実現します。
効率的な静的再構成のための大規模な再構成モデルの出現に動機付けられ、LHM（大規模なアニメーション可能な人間の再構成モデル）を提案して、フィードフォワードパスで3Dガウスのスプラッティングとして表される高忠実度のアバターを推測します。
私たちのモデルは、マルチモーダルトランスアーキテクチャを活用して、人体の位置の特徴と注意メカニズムを備えた画像機能を効果的にエンコードし、衣類のジオメトリとテクスチャの詳細な保存を可能にします。
顔のアイデンティティの保存と細かい詳細回復をさらに高めるために、ヘッドフィーチャーピラミッドエンコードスキームを提案して、ヘッド領域のマルチスケール機能を集約します。
広範な実験は、私たちのLHMが、顔と手の後処理なしに数秒でもっともらしいアニメーション可能な人間を生成し、再構築の精度と一般化能力の両方で既存の方法を上回ることを示しています。

要約(オリジナル)

Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation. Recent advances in 3D human reconstruction mainly focus on static human modeling, and the reliance of using synthetic 3D scans for training limits their generalization ability. Conversely, optimization-based video methods achieve higher fidelity but demand controlled capture conditions and computationally intensive refinement processes. Motivated by the emergence of large reconstruction models for efficient static reconstruction, we propose LHM (Large Animatable Human Reconstruction Model) to infer high-fidelity avatars represented as 3D Gaussian splatting in a feed-forward pass. Our model leverages a multimodal transformer architecture to effectively encode the human body positional features and image features with attention mechanism, enabling detailed preservation of clothing geometry and texture. To further boost the face identity preservation and fine detail recovery, we propose a head feature pyramid encoding scheme to aggregate multi-scale features of the head regions. Extensive experiments demonstrate that our LHM generates plausible animatable human in seconds without post-processing for face and hands, outperforming existing methods in both reconstruction accuracy and generalization ability.

arxiv情報

著者	Lingteng Qiu,Xiaodong Gu,Peihao Li,Qi Zuo,Weichao Shen,Junfei Zhang,Kejie Qiu,Weihao Yuan,Guanying Chen,Zilong Dong,Liefeng Bo
発行日	2025-03-13 17:59:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー