Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

要約

単一の視点から複雑な動的シーンを正確に再構成することは、コンピュータビジョンにおける挑戦的な課題であり続けている。現在の動的新視点合成手法は、通常、多くの異なるカメラ視点からの映像を必要とし、慎重な録画設定が必要となり、野生の場面や具現化AIアプリケーションの観点からの有用性を著しく制限している。本論文では、$textbf{GCD}$という制御可能な単眼動的視点合成パイプラインを提案する。このパイプラインは、大規模拡散事前分布を活用し、任意のシーンの動画が与えられたとき、相対カメラポーズパラメータセットを条件として、他の選択された視点からの同期動画を生成する。我々のモデルは、入力として奥行きを必要とせず、3Dシーンジオメトリを明示的にモデル化せず、その代わりに、その目標を効率的に達成するために、エンドツーエンドのビデオからビデオへの変換を実行する。合成された多視点映像データのみで訓練されているにもかかわらず、ゼロショット実世界汎化実験では、ロボット工学、オブジェクト永続性、運転環境など、複数の領域で有望な結果を示している。我々は、我々のフレームワークが、リッチな動的シーン理解、ロボット工学のための知覚、バーチャルリアリティのためのインタラクティブな3Dビデオ視聴体験などの強力なアプリケーションを解き放つ可能性があると信じている。

要約(オリジナル)

Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision. Current dynamic novel view synthesis methods typically require videos from many different camera viewpoints, necessitating careful recording setups, and significantly restricting their utility in the wild as well as in terms of embodied AI applications. In this paper, we propose $\textbf{GCD}$, a controllable monocular dynamic view synthesis pipeline that leverages large-scale diffusion priors to, given a video of any scene, generate a synchronous video from any other chosen perspective, conditioned on a set of relative camera pose parameters. Our model does not require depth as input, and does not explicitly model 3D scene geometry, instead performing end-to-end video-to-video translation in order to achieve its goal efficiently. Despite being trained on synthetic multi-view video data only, zero-shot real-world generalization experiments show promising results in multiple domains, including robotics, object permanence, and driving environments. We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.

arxiv情報

著者	Basile Van Hoorick,Rundi Wu,Ege Ozguroglu,Kyle Sargent,Ruoshi Liu,Pavel Tokmakov,Achal Dave,Changxi Zheng,Carl Vondrick
発行日	2024-07-05 17:59:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー