Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors

要約

一貫性のあるフォトリアリスティックな3Dシーンを合成することは、コンピュータービジョンのオープンな問題です。
ビデオ拡散モデルは印象的なビデオを生成しますが、3D表現を直接合成することはできません。つまり、生成されたシーケンスに3D一貫性がありません。
さらに、大規模な3Dトレーニングデータが不足しているため、生成3Dモデルを直接トレーニングすることは困難です。
この作業では、3D表現を事前に訓練した潜在ビデオ拡散モデルと統合する新しいアプローチである生成ガウススプラッティング（GGS）を提示します。
具体的には、私たちのモデルは、3Dガウスプリミティブを介してパラメーター化された機能フィールドを合成します。
機能フィールドは、マップを特徴とするようにレンダリングされ、マルチビュー画像にデコードされるか、3D放射輝度フィールドに直接アップサンプリングされます。
シーン合成の2つの一般的なベンチマークデータセットであるRealestate10KおよびScannet+でアプローチを評価し、提案されたGGSモデルにより、生成されたマルチビュー画像の3D一貫性と、関連するすべてのベースラインにわたって生成された3Dシーンの品質の両方が大幅に改善されることがわかります。
3D表現のない同様のモデルと比較して、GGSは、生成された3DシーンでFIDを改善し、Realestate10KとScannet+の両方で〜20％改善します。
プロジェクトページ：https：//katjaschwarz.github.io/ggs/

要約(オリジナル)

Synthesizing consistent and photorealistic 3D scenes is an open problem in computer vision. Video diffusion models generate impressive videos but cannot directly synthesize 3D representations, i.e., lack 3D consistency in the generated sequences. In addition, directly training generative 3D models is challenging due to a lack of 3D training data at scale. In this work, we present Generative Gaussian Splatting (GGS) — a novel approach that integrates a 3D representation with a pre-trained latent video diffusion model. Specifically, our model synthesizes a feature field parameterized via 3D Gaussian primitives. The feature field is then either rendered to feature maps and decoded into multi-view images, or directly upsampled into a 3D radiance field. We evaluate our approach on two common benchmark datasets for scene synthesis, RealEstate10K and ScanNet+, and find that our proposed GGS model significantly improves both the 3D consistency of the generated multi-view images, and the quality of the generated 3D scenes over all relevant baselines. Compared to a similar model without 3D representation, GGS improves FID on the generated 3D scenes by ~20% on both RealEstate10K and ScanNet+. Project page: https://katjaschwarz.github.io/ggs/

arxiv情報

著者	Katja Schwarz,Norman Mueller,Peter Kontschieder
発行日	2025-03-17 15:24:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー