IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

要約

テキストから 3D への技術における最近の進歩は、強力な大規模なテキストから画像への拡散モデル (LDM) からの知識を抽出することによって推進されています。
それにもかかわらず、既存の Text-to-3D アプローチは、過彩度、不十分なディテール、非現実的な出力などの課題に直面することがよくあります。
この研究は、これらの問題に対処するために明示的に合成された多視点画像を活用する新しい戦略を提示します。
私たちのアプローチには、LDM によって強化された画像間のパイプラインを利用して、粗い 3D モデルのレンダリングに基づいてポーズをとった高品質の画像を生成することが含まれます。
生成された画像によって前述の問題はほとんど軽減されますが、大規模拡散モデルの固有の生成特性により、表示の不一致やコンテンツの大幅な差異などの課題が残り、これらの画像を効果的に活用する際に多大な困難が生じます。
このハードルを克服するために、3D モデルのトレーニングをガイドする新しい Diffusion-GAN デュアルトレーニング戦略とディスクリミネーターを統合することを推奨します。
組み込まれた識別器にとって、合成された多視点画像は本物のデータとみなされ、最適化された 3D モデルのレンダリングは偽のデータとして機能します。
私たちは、ベースラインアプローチに対する私たちの方法の有効性を実証する一連の包括的な実験を実施します。

要約(オリジナル)

Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs). Nonetheless, existing Text-to-3D approaches often grapple with challenges such as over-saturation, inadequate detailing, and unrealistic outputs. This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images based on the renderings of coarse 3D models. Although the generated images mostly alleviate the aforementioned issues, challenges such as view inconsistency and significant content variance persist due to the inherent generative nature of large diffusion models, posing extensive difficulties in leveraging these images effectively. To overcome this hurdle, we advocate integrating a discriminator alongside a novel Diffusion-GAN dual training strategy to guide the training of 3D models. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data. We conduct a comprehensive set of experiments that demonstrate the effectiveness of our method over baseline approaches.

arxiv情報

著者	Yiwen Chen,Chi Zhang,Xiaofeng Yang,Zhongang Cai,Gang Yu,Lei Yang,Guosheng Lin
発行日	2023-08-22 14:39:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー