MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention

要約

マルチビュー拡散モデルは、一般的なオブジェクトの画像から3Dの生成でかなりの成功を示しています。
ただし、人間のデータに適用されると、既存の方法では、主にマルチビューの注意を高度な解像度にスケーリングするという課題が原因であるため、既存の方法はまだ有望な結果をもたらしていません。
このホワイトペーパーでは、メガピクセルレベルでヒトマルチビュー拡散モデルを探索し、1024×1024解像度でトレーニングを可能にするためにメッシュの注意と呼ばれるソリューションを導入します。
服を着たヒトメッシュを中央の粗い幾何表現として使用して、提案されたメッシュの注意はラスター化と投影をレバレッジして、直接クロスビュー座標対応を確立します。
このアプローチは、クロスビューの一貫性を維持しながら、マルチビューの注意の複雑さを大幅に削減します。
この基盤に基づいて、メッシュの注意ブロックを考案し、それをキーポイントコンディショニングと組み合わせて、人間固有のマルチビュー拡散モデルである肉を作成します。
さらに、データ不足の長年の問題に対処するために、拡散トレーニングにマルチビューヒューマンモーションビデオを適用することに関する貴重な洞察を提示します。
広範な実験では、肉がメガピクセルレベルで密集した一貫したマルチビューヒューマン画像を効果的に生成し、既存のマルチビュー拡散法よりも優れていることが示されています。

要約(オリジナル)

Multiview diffusion models have shown considerable success in image-to-3D generation for general objects. However, when applied to human data, existing methods have yet to deliver promising results, largely due to the challenges of scaling multiview attention to higher resolutions. In this paper, we explore human multiview diffusion models at the megapixel level and introduce a solution called mesh attention to enable training at 1024×1024 resolution. Using a clothed human mesh as a central coarse geometric representation, the proposed mesh attention leverages rasterization and projection to establish direct cross-view coordinate correspondences. This approach significantly reduces the complexity of multiview attention while maintaining cross-view consistency. Building on this foundation, we devise a mesh attention block and combine it with keypoint conditioning to create our human-specific multiview diffusion model, MEAT. In addition, we present valuable insights into applying multiview human motion videos for diffusion training, addressing the longstanding issue of data scarcity. Extensive experiments show that MEAT effectively generates dense, consistent multiview human images at the megapixel level, outperforming existing multiview diffusion methods.

arxiv情報

著者	Yuhan Wang,Fangzhou Hong,Shuai Yang,Liming Jiang,Wayne Wu,Chen Change Loy
発行日	2025-03-11 17:50:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー