Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

要約

3D 生成モデルはアーティストのワークフローを大幅に改善しましたが、3D 生成用の既存の拡散モデルは生成が遅く、一般化が不十分であるという問題がありました。
この問題に対処するために、私たちはライトバージョンと標準バージョンを含む Hunyuan3D-1.0 という 2 段階のアプローチを提案します。これらのアプローチは両方ともテキストと画像の条件付き生成をサポートします。
第一段階では、多視点RGBを約4秒で効率的に生成する多視点拡散モデルを採用。
これらのマルチビュー画像は、さまざまな視点から 3D アセットの豊富な詳細をキャプチャし、シングルビューからマルチビューの再構築までのタスクを軽減します。
第 2 段階では、生成されたマルチビュー画像から 3D アセットを約 7 秒で迅速かつ忠実に再構築するフィードフォワード再構築モデルを導入します。
再構成ネットワークは、多視点拡散によってもたらされるノイズと不一致の処理方法を学習し、状態画像から得られる情報を活用して 3D 構造を効率的に復元します。
私たちのフレームワークには、テキストから画像へのモデル、つまり Hunyuan-DiT が含まれており、テキストと画像の両方の条件付き 3D 生成をサポートする統合フレームワークとなっています。
当社の標準バージョンには、lite および他の既存のモデルよりも 3 倍多くのパラメータがあります。
当社の Hunyuan3D-1.0 は、速度と品質の見事なバランスを実現し、生成されるアセットの品質と多様性を維持しながら生成時間を大幅に短縮します。

要約(オリジナル)

While 3D generative models have greatly improved artists’ workflows, the existing diffusion models for 3D generation suffer from slow generation and poor generalization. To address this issue, we propose a two-stage approach named Hunyuan3D-1.0 including a lite version and a standard version, that both support text- and image-conditioned generation. In the first stage, we employ a multi-view diffusion model that efficiently generates multi-view RGB in approximately 4 seconds. These multi-view images capture rich details of the 3D asset from different viewpoints, relaxing the tasks from single-view to multi-view reconstruction. In the second stage, we introduce a feed-forward reconstruction model that rapidly and faithfully reconstructs the 3D asset given the generated multi-view images in approximately 7 seconds. The reconstruction network learns to handle noises and in-consistency introduced by the multi-view diffusion and leverages the available information from the condition image to efficiently recover the 3D structure. Our framework involves the text-to-image model, i.e., Hunyuan-DiT, making it a unified framework to support both text- and image-conditioned 3D generation. Our standard version has 3x more parameters than our lite and other existing model. Our Hunyuan3D-1.0 achieves an impressive balance between speed and quality, significantly reducing generation time while maintaining the quality and diversity of the produced assets.

arxiv情報

著者	Xianghui Yang,Huiwen Shi,Bowen Zhang,Fan Yang,Jiacheng Wang,Hongxu Zhao,Xinhai Liu,Xinzhou Wang,Qingxiang Lin,Jiaao Yu,Lifu Wang,Zhuo Chen,Sicong Liu,Yuhong Liu,Yong Yang,Di Wang,Jie Jiang,Chunchao Guo
発行日	2024-11-05 14:33:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー