Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views

要約

1 つの画像からマルチビュー 3D を合成することは、重要かつ困難な作業です。
この目標のために、Zero-1-to-3 メソッドは 2D 潜在拡散モデルを 3D スコープに拡張することを目的としています。
これらの手法では、シングルビューのソース画像とカメラの姿勢を条件情報としてターゲットビュー画像を生成します。
ただし、Zero-1-to-3 で採用されている 1 対 1 の方法では、特に複雑なオブジェクトの場合、ビュー間で幾何学的および視覚的な一貫性を構築するのに課題が生じます。
この問題に取り組むために、ソース画像から 3D 情報を段階的に抽出する、Cascade-Zero123 という名前の 2 つの Zero-1-to-3 モデルで構築されたカスケード生成フレームワークを提案します。
具体的には、自動プロンプト機構は、最初にいくつかの近くのビューを生成するように設計されています。
これらのビューは、生成条件としてソース画像とともに第 2 段階のモデルに入力されます。
Cascade-Zero123 は、補足情報として自動生成された複数のビューを使用して、Zero-1-to-3 よりも一貫性の高い新しいビュー画像を生成します。
このプロモーションは、昆虫、人間、透明なオブジェクト、積み重ねられた複数のオブジェクトなどを含む、さまざまな複雑で挑戦的なシーンにとって重要です。プロジェクトページは https://cascadezero123.github.io/ にあります。

要約(オリジナル)

Synthesizing multi-view 3D from one single image is a significant and challenging task. For this goal, Zero-1-to-3 methods aim to extend a 2D latent diffusion model to the 3D scope. These approaches generate the target-view image with a single-view source image and the camera pose as condition information. However, the one-to-one manner adopted in Zero-1-to-3 incurs challenges for building geometric and visual consistency across views, especially for complex objects. We propose a cascade generation framework constructed with two Zero-1-to-3 models, named Cascade-Zero123, to tackle this issue, which progressively extracts 3D information from the source image. Specifically, a self-prompting mechanism is designed to generate several nearby views at first. These views are then fed into the second-stage model along with the source image as generation conditions. With self-prompted multiple views as the supplementary information, our Cascade-Zero123 generates more highly consistent novel-view images than Zero-1-to-3. The promotion is significant for various complex and challenging scenes, involving insects, humans, transparent objects, and stacked multiple objects etc. The project page is at https://cascadezero123.github.io/.

arxiv情報

著者	Yabo Chen,Jiemin Fang,Yuyang Huang,Taoran Yi,Xiaopeng Zhang,Lingxi Xie,Xinggang Wang,Wenrui Dai,Hongkai Xiong,Qi Tian
発行日	2023-12-07 16:49:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー