StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation

要約

大規模な再構成と生成モデルの最近の進歩により、シーンの再構築と新規ビューの生成が大幅に改善されました。
ただし、制限の計算により、これらの大きなモデルとの各推論は小さな領域に限定されているため、長距離の一貫したシーン生成が困難になります。
これに対処するために、長距離シーン生成のために自動脱着の方法で事前に訓練されたビデオ拡散モデルを採用する新しいフレームワークであるStargenを提案します。
各ビデオクリップの生成は、空間的に隣接する画像の3Dワーピングと、以前に生成されたクリップからの時間的に重複する画像を条件付け、正確なポーズコントロールを備えた長距離シーン生成の時空間的な一貫性を改善します。
時空間的条件は、さまざまな入力条件と互換性があり、まばらなビュー補間、永続的なビューの生成、レイアウト条件付けの都市生成など、多様なタスクを促進します。
定量的および定性的評価は、最先端の方法と比較して、Stargenの優れたスケーラビリティ、忠実度、およびポーズ精度を示しています。
プロジェクトページ：https：//zju3dv.github.io/stargen。

要約(オリジナル)

Recent advances in large reconstruction and generative models have significantly improved scene reconstruction and novel view generation. However, due to compute limitations, each inference with these large models is confined to a small area, making long-range consistent scene generation challenging. To address this, we propose StarGen, a novel framework that employs a pre-trained video diffusion model in an autoregressive manner for long-range scene generation. The generation of each video clip is conditioned on the 3D warping of spatially adjacent images and the temporally overlapping image from previously generated clips, improving spatiotemporal consistency in long-range scene generation with precise pose control. The spatiotemporal condition is compatible with various input conditions, facilitating diverse tasks, including sparse view interpolation, perpetual view generation, and layout-conditioned city generation. Quantitative and qualitative evaluations demonstrate StarGen’s superior scalability, fidelity, and pose accuracy compared to state-of-the-art methods. Project page: https://zju3dv.github.io/StarGen.

arxiv情報

著者	Shangjin Zhai,Zhichao Ye,Jialin Liu,Weijian Xie,Jiaqi Hu,Zhen Peng,Hua Xue,Danpeng Chen,Xiaomeng Wang,Lei Yang,Nan Wang,Haomin Liu,Guofeng Zhang
発行日	2025-04-01 08:18:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー