UpFusion: Novel View Diffusion from Unposed Sparse View Observations

要約

我々はUpFusionを提案する。UpFusionは新しいビュー合成を行い、対応するポーズ情報のない疎な参照画像セットが与えられたオブジェクトの3D表現を推論することができるシステムである。現在のスパースビュー3D推論手法は、入力ビューからの情報を幾何学的に集約するためにカメラのポーズに依存するのが一般的だが、そのような情報が利用できない／不正確な場合、実環境ではロバストではない。これに対してUpFusionは、新しいビューを合成するための条件付き生成モデルにおいて、利用可能な画像をコンテキストとして暗黙的に活用する学習を行うことで、この要件を回避します。a)シーンレベルの変換器を用いて、クエリビューに整合した特徴を推論する、b)入力画像トークンを直接観察できる中間的な注意層を介する。このメカニズムにより、合成品質を向上させながら、忠実度の高い新規ビューを生成できることを示す。Co3Dv2とGoogle Scanned Objectsデータセットで我々の手法を評価し、ポーズに依存したスパースビュー手法や、追加ビューを活用できないシングルビュー手法に対する我々の手法の利点を示す。最後に、我々の学習したモデルが、学習カテゴリを超えて一般化できること、さらには、一般的なオブジェクトの自己撮影画像から再構成できることも示す。

要約(オリジナル)

We propose UpFusion, a system that can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images without corresponding pose information. Current sparse-view 3D inference methods typically rely on camera poses to geometrically aggregate information from input views, but are not robust in-the-wild when such information is unavailable/inaccurate. In contrast, UpFusion sidesteps this requirement by learning to implicitly leverage the available images as context in a conditional generative model for synthesizing novel views. We incorporate two complementary forms of conditioning into diffusion models for leveraging the input views: a) via inferring query-view aligned features using a scene-level transformer, b) via intermediate attentional layers that can directly observe the input image tokens. We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images. We evaluate our approach on the Co3Dv2 and Google Scanned Objects datasets and demonstrate the benefits of our method over pose-reliant sparse-view methods as well as single-view methods that cannot leverage additional views. Finally, we also show that our learned model can generalize beyond the training categories and even allow reconstruction from self-captured images of generic objects in-the-wild.

arxiv情報

著者	Bharath Raj Nagoor Kani,Hsin-Ying Lee,Sergey Tulyakov,Shubham Tulsiani
発行日	2024-01-04 17:59:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

UpFusion: Novel View Diffusion from Unposed Sparse View Observations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー