Control3D: Towards Controllable Text-to-3D Generation

要約

大規模なテキストから画像への拡散モデルにおける最近の目覚ましい進歩により、テキストから 3D への生成が大幅に進歩し、指定されたテキストプロンプトのみから 3D コンテンツの作成が追求されています。
しかし、既存のテキストを 3D に変換する技術には、ユーザーの希望する仕様 (スケッチなど) に従って合成 3D コンテンツをインタラクティブに制御および形成するという、クリエイティブなプロセスにおいて重要な機能が欠けています。
この問題を軽減するために、追加の手描きスケッチ、つまりユーザーの制御性を強化する Control3D でのテキストから 3D への生成条件付けの最初の試みを紹介します。
特に、2D 条件付き拡散モデル (ControlNet) は、NeRF としてパラメータ化された 3D シーンの学習をガイドするために再成形され、指定されたテキストプロンプトと手描きのスケッチに合わせて 3D シーンの各ビューを促進します。
さらに、事前トレーニングされた微分可能な写真からスケッチへのモデルを利用して、合成 3D シーン上でレンダリングされたイメージのスケッチを直接推定します。
このような推定されたスケッチと各サンプルビューは、指定されたスケッチと幾何学的に一致するようにさらに強制され、より制御可能なテキストから 3D への生成を追求します。
広範な実験を通じて、私たちの提案が入力テキストプロンプトやスケッチと厳密に一致する正確かつ忠実な 3D シーンを生成できることを実証しました。

要約(オリジナル)

Recent remarkable advances in large-scale text-to-image diffusion models have inspired a significant breakthrough in text-to-3D generation, pursuing 3D content creation solely from a given text prompt. However, existing text-to-3D techniques lack a crucial ability in the creative process: interactively control and shape the synthetic 3D contents according to users’ desired specifications (e.g., sketch). To alleviate this issue, we present the first attempt for text-to-3D generation conditioning on the additional hand-drawn sketch, namely Control3D, which enhances controllability for users. In particular, a 2D conditioned diffusion model (ControlNet) is remoulded to guide the learning of 3D scene parameterized as NeRF, encouraging each view of 3D scene aligned with the given text prompt and hand-drawn sketch. Moreover, we exploit a pre-trained differentiable photo-to-sketch model to directly estimate the sketch of the rendered image over synthetic 3D scene. Such estimated sketch along with each sampled view is further enforced to be geometrically consistent with the given sketch, pursuing better controllable text-to-3D generation. Through extensive experiments, we demonstrate that our proposal can generate accurate and faithful 3D scenes that align closely with the input text prompts and sketches.

arxiv情報

著者	Yang Chen,Yingwei Pan,Yehao Li,Ting Yao,Tao Mei
発行日	2023-11-09 15:50:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Control3D: Towards Controllable Text-to-3D Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー