Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

要約

生成 AI の最近の進歩により、3D コンテンツ作成の大きな可能性が明らかになりました。
ただし、現在の手法では、時間のかかるスコア蒸留サンプリング (SDS) を使用して事前トレーニングされた 2D 拡散モデルを適用するか、世代の多様性を失って限定された 3D データでトレーニングされた直接 3D 拡散モデルを適用します。
この研究では、事前にトレーニングされた 2D 拡散モデルから微調整されたマルチビュー 2.5D 拡散を採用することで、この問題に取り組みます。
マルチビュー 2.5D 拡散は、元の 2D 拡散モデルの強力な一般化能力を維持しながら、3D データの構造分布を直接モデル化し、3D コンテンツ生成のための 2D 拡散ベースの方法と直接 3D 拡散ベースの方法の間のギャップを埋めます。
推論中に、2.5D 拡散を使用してマルチビュー法線マップが生成され、ほぼ一貫性のあるマルチビュー法線マップを一貫性のある 3D モデルに融合するために、新しい微分可能なラスタライゼーションスキームが導入されます。
さらに、3D ジオメトリを考慮して外観を高速に生成するための標準条件付きマルチビュー画像生成モジュールを設計します。
私たちの方法はワンパス拡散プロセスであり、後処理として SDS 最適化を必要としません。
当社は広範な実験を通じて、特別に設計されたフュージョンスキームによる直接 2.5D 生成により、多様でモードシークフリーの高忠実度の 3D コンテンツ生成をわずか 10 秒で実現できることを実証しました。
プロジェクトページ: https://nju-3dv.github.io/projects/direct25。

要約(オリジナル)

Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.

arxiv情報

著者	Yuanxun Lu,Jingyang Zhang,Shiwei Li,Tian Fang,David McKinnon,Yanghai Tsin,Long Quan,Xun Cao,Yao Yao
発行日	2023-11-27 16:26:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー