Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models

要約

単一の画像からのゼロショットノベルビュー合成 (NVS) は、3D オブジェクトの理解において不可欠な問題です。
事前トレーニングされた生成モデルを活用する最近のアプローチでは、実際の入力から高品質の新しいビューを合成できますが、異なるビュー間で 3D の一貫性を維持するのに依然として苦労しています。
このペーパーでは、この問題を大幅に軽減する生成フレームワークである Consistent-1-to-3 を紹介します。
具体的には、NVS タスクを 2 つの段階に分解します: (i) 観察された領域を新しいビューに変換する、および (ii) 見えない領域を幻覚する。
これら 2 つの段階をそれぞれ実行するために、シーン表現トランスフォーマーとビュー条件付き拡散モデルを設計します。
モデル内では、3D の一貫性を強化するために、エピポーラに基づくアテンションを使用してジオメトリ制約を組み込み、マルチビューアテンションを使用してマルチビュー情報をより適切に集約することを提案します。
最後に、一貫したビューの長いシーケンスを生成する階層生成パラダイムを設計し、提供されたオブジェクト画像を 360 度完全に観察できるようにします。
複数のデータセットにわたる定性的および定量的評価により、最先端のアプローチに対する提案されたメカニズムの有効性が実証されます。
私たちのプロジェクトページは https://jianglongye.com/consistent123/ にあります。

要約(オリジナル)

Zero-shot novel view synthesis (NVS) from a single image is an essential problem in 3D object understanding. While recent approaches that leverage pre-trained generative models can synthesize high-quality novel views from in-the-wild inputs, they still struggle to maintain 3D consistency across different views. In this paper, we present Consistent-1-to-3, which is a generative framework that significantly mitigate this issue. Specifically, we decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions. We design a scene representation transformer and view-conditioned diffusion model for performing these two stages respectively. Inside the models, to enforce 3D consistency, we propose to employ epipolor-guided attention to incorporate geometry constraints, and multi-view attention to better aggregate multi-view information. Finally, we design a hierarchy generation paradigm to generate long sequences of consistent views, allowing a full 360 observation of the provided object image. Qualitative and quantitative evaluation over multiple datasets demonstrate the effectiveness of the proposed mechanisms against state-of-the-art approaches. Our project page is at https://jianglongye.com/consistent123/

arxiv情報

著者	Jianglong Ye,Peng Wang,Kejie Li,Yichun Shi,Heng Wang
発行日	2023-10-04 17:58:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー