MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

要約

私たちは、まばらな観測のみを使用して、現実世界の多様なシーンの 360{\deg} 新規ビュー合成 (NVS) のためのフィードフォワードアプローチである MVSplat360 を紹介します。
この設定は、入力ビュー間の重複が最小限であり、提供される視覚情報が不十分であるため、本質的に不適切な設定となっており、従来の方法で高品質の結果を達成することが困難になっています。
当社の MVSplat360 は、ジオメトリを意識した 3D 再構成と時間的に一貫したビデオ生成を効果的に組み合わせることで、この問題に対処します。
具体的には、フィードフォワード 3D ガウススプラッティング (3DGS) モデルをリファクタリングして、事前トレーニングされた安定ビデオ拡散 (SVD) モデルの潜在空間にフィーチャを直接レンダリングします。そこで、これらのフィーチャは、ノイズ除去をガイドするポーズと視覚的なキューとして機能します。
フォトリアリスティックな 3D 一貫性のあるビューを処理して生成します。
私たちのモデルはエンドツーエンドでトレーニング可能で、わずか 5 つのスパース入力ビューで任意のビューのレンダリングをサポートします。
MVSplat360 のパフォーマンスを評価するために、挑戦的な DL3DV-10K データセットを使用した新しいベンチマークを導入します。MVSplat360 は、広範な、または 360{\deg} の NVS タスクにおいて、最先端の手法と比較して優れた視覚品質を実現します。
既存のベンチマーク RealEstate10K での実験でも、モデルの有効性が確認されています。
ビデオ結果はプロジェクトページ https://donydchen.github.io/mvsplat360 でご覧いただけます。

要約(オリジナル)

We introduce MVSplat360, a feed-forward approach for 360{\deg} novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided, making it challenging for conventional methods to achieve high-quality results. Our MVSplat360 addresses this by effectively combining geometry-aware 3D reconstruction with temporally consistent video generation. Specifically, it refactors a feed-forward 3D Gaussian Splatting (3DGS) model to render features directly into the latent space of a pre-trained Stable Video Diffusion (SVD) model, where these features then act as pose and visual cues to guide the denoising process and produce photorealistic 3D-consistent views. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views. To evaluate MVSplat360’s performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360{\deg} NVS tasks. Experiments on the existing benchmark RealEstate10K also confirm the effectiveness of our model. The video results are available on our project page: https://donydchen.github.io/mvsplat360.

arxiv情報

著者	Yuedong Chen,Chuanxia Zheng,Haofei Xu,Bohan Zhuang,Andrea Vedaldi,Tat-Jen Cham,Jianfei Cai
発行日	2024-11-07 17:59:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー