SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

要約

SparseFusion は、ニューラルレンダリングと確率的画像生成の最近の進歩を統合するスパースビュー 3D 再構築アプローチです。
既存のアプローチは通常、再投影された機能を備えたニューラルレンダリングに基づいて構築されていますが、見えない領域を生成したり、視点が大きく変化した場合の不確実性を処理したりできません。
別の方法では、これを (確率論的) 2D 合成タスクとして扱い、もっともらしい 2D 画像を生成できますが、一貫した基礎となる 3D を推測しません。
ただし、3D の一貫性と確率的な画像生成の間のこのトレードオフは存在する必要がないことがわかりました。
実際、幾何学的な一貫性と生成的推論がモード探索動作で補完できることを示しています。
ビュー調整された潜在拡散モデルから一貫した 3D シーン表現を抽出することにより、レンダリングが正確で現実的である妥当な 3D 表現を復元することができます。
CO3D データセットの 51 のカテゴリにわたってアプローチを評価し、スパースビューの新規ビュー合成について、歪みと知覚の両方の指標で既存の方法よりも優れていることを示します。

要約(オリジナル)

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but fail to generate unseen regions or handle uncertainty under large viewpoint changes. Alternate methods treat this as a (probabilistic) 2D synthesis task, and while they can generate plausible 2D images, they do not infer a consistent underlying 3D. However, we find that this trade-off between 3D consistency and probabilistic image generation does not need to exist. In fact, we show that geometric consistency and generative inference can be complementary in a mode-seeking behavior. By distilling a 3D consistent scene representation from a view-conditioned latent diffusion model, we are able to recover a plausible 3D representation whose renderings are both accurate and realistic. We evaluate our approach across 51 categories in the CO3D dataset and show that it outperforms existing methods, in both distortion and perception metrics, for sparse-view novel view synthesis.

arxiv情報

著者	Zhizhuo Zhou,Shubham Tulsiani
発行日	2022-12-01 18:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー