Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

要約

単一の視点から複雑な動的シーンを正確に再構成することは、コンピュータービジョンにおいて引き続き困難な課題です。
現在のダイナミックな新規ビュー合成手法では、通常、多くの異なるカメラ視点からのビデオが必要であり、慎重な録画設定が必要となり、実装された AI アプリケーションだけでなく、実際の現場での有用性も大幅に制限されます。
この論文では、$\textbf{GCD}$ を提案します。これは、任意のシーンのビデオが与えられた場合に、選択された他の視点からの同期ビデオを生成する前に大規模な拡散を活用する、制御可能な単眼の動的ビュー合成パイプラインです。
相対的なカメラポーズパラメータのセット。
私たちのモデルは入力として深度を必要とせず、3D シーンジオメトリを明示的にモデル化しません。代わりに、目的を効率的に達成するためにエンドツーエンドのビデオ間の変換を実行します。
合成マルチビュービデオデータのみでトレーニングされているにもかかわらず、ゼロショットの実世界汎化実験は、ロボット工学、オブジェクトの永続性、運転環境などの複数の領域で有望な結果を示しています。
私たちは、私たちのフレームワークが、豊かでダイナミックなシーンの理解、ロボット工学のための認識、そして仮想現実のためのインタラクティブな 3D ビデオ視聴体験における強力なアプリケーションを潜在的に解き放つことができると信じています。

要約(オリジナル)

Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision. Current dynamic novel view synthesis methods typically require videos from many different camera viewpoints, necessitating careful recording setups, and significantly restricting their utility in the wild as well as in terms of embodied AI applications. In this paper, we propose $\textbf{GCD}$, a controllable monocular dynamic view synthesis pipeline that leverages large-scale diffusion priors to, given a video of any scene, generate a synchronous video from any other chosen perspective, conditioned on a set of relative camera pose parameters. Our model does not require depth as input, and does not explicitly model 3D scene geometry, instead performing end-to-end video-to-video translation in order to achieve its goal efficiently. Despite being trained on synthetic multi-view video data only, zero-shot real-world generalization experiments show promising results in multiple domains, including robotics, object permanence, and driving environments. We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.

arxiv情報

著者	Basile Van Hoorick,Rundi Wu,Ege Ozguroglu,Kyle Sargent,Ruoshi Liu,Pavel Tokmakov,Achal Dave,Changxi Zheng,Carl Vondrick
発行日	2024-05-23 17:59:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー