Novel View Synthesis with Pixel-Space Diffusion Models

要約

単一の入力画像から新しいビューを合成するのは困難な作業です。
従来、このタスクには、パイプラインの一部を可能にする機械学習モデルを使用して、シーン深度の推定、ワーピング、修復を行うことでアプローチしていました。
最近では、生成モデルがノベルビュー合成 (NVS) で採用されることが増えており、多くの場合、エンドツーエンドシステム全体を網羅しています。
この研究では、ピクセル空間のエンドツーエンド NVS に最新の拡散モデルアーキテクチャを適応させ、以前の最先端 (SOTA) 技術を大幅に上回りました。
私たちは、幾何学的情報をネットワークにエンコードするさまざまな方法を検討します。
私たちの実験によると、これらの方法はパフォーマンスを向上させる可能性がありますが、改良された生成モデルを使用する場合に比べてその影響は小さいことがわかりました。
さらに、シングルビューデータセットを利用し、マルチビューデータセットと比較して相対的に豊富なデータセットを利用する、新しい NVS トレーニングスキームを導入します。
これにより、ドメイン外のコンテンツを含むシーンに対する一般化機能が向上します。

要約(オリジナル)

Synthesizing a novel view from a single input image is a challenging task. Traditionally, this task was approached by estimating scene depth, warping, and inpainting, with machine learning models enabling parts of the pipeline. More recently, generative models are being increasingly employed in novel view synthesis (NVS), often encompassing the entire end-to-end system. In this work, we adapt a modern diffusion model architecture for end-to-end NVS in the pixel space, substantially outperforming previous state-of-the-art (SOTA) techniques. We explore different ways to encode geometric information into the network. Our experiments show that while these methods may enhance performance, their impact is minor compared to utilizing improved generative models. Moreover, we introduce a novel NVS training scheme that utilizes single-view datasets, capitalizing on their relative abundance compared to their multi-view counterparts. This leads to improved generalization capabilities to scenes with out-of-domain content.

arxiv情報

著者	Noam Elata,Bahjat Kawar,Yaron Ostrovsky-Berman,Miriam Farber,Ron Sokolovsky
発行日	2024-11-12 12:58:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Novel View Synthesis with Pixel-Space Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー