MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

要約

このペーパーでは、パノラマからの透視クロップやジオメトリ (深度マップとポーズ) を与えられたマルチビュー画像など、ピクセル間の対応が利用可能なシナリオ向けのシンプルかつ効果的なマルチビュー画像生成方法である MVDiffusion を紹介します。
反復的な画像のワーピングと修復に依存する以前のモデルとは異なり、MVDiffusion は、高解像度と豊富なコンテンツを含むグローバルな認識ですべての画像を同時に生成し、以前のモデルで一般的だったエラーの蓄積に効果的に対処します。
MVDiffusion には、特に対応を認識したアテンションメカニズムが組み込まれており、効果的なクロスビューインタラクションが可能になります。
このメカニズムは、3 つの重要なモジュールを支えています。1) グローバル対応を維持しながら低解像度画像を生成する生成モジュール、2) 画像間の空間範囲を高密度化する補間モジュール、3) 高解像度出力にアップスケールする超解像度モジュール
。
パノラマ画像に関しては、MVDiffusion は最大 1024$\times$1024 ピクセルの高解像度のフォトリアリスティックな画像を生成できます。
MVDiffusion は、ジオメトリ条件付きマルチビューイメージ生成について、シーンメッシュのテクスチャマップを生成できる最初の方法を示します。
プロジェクトページは https://mvdiffusion.github.io にあります。

要約(オリジナル)

This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given geometry (depth maps and poses). Unlike prior models that rely on iterative image warping and inpainting, MVDiffusion concurrently generates all images with a global awareness, encompassing high resolution and rich content, effectively addressing the error accumulation prevalent in preceding models. MVDiffusion specifically incorporates a correspondence-aware attention mechanism, enabling effective cross-view interaction. This mechanism underpins three pivotal modules: 1) a generation module that produces low-resolution images while maintaining global correspondence, 2) an interpolation module that densifies spatial coverage between images, and 3) a super-resolution module that upscales into high-resolution outputs. In terms of panoramic imagery, MVDiffusion can generate high-resolution photorealistic images up to 1024$\times$1024 pixels. For geometry-conditioned multi-view image generation, MVDiffusion demonstrates the first method capable of generating a textured map of a scene mesh. The project page is at https://mvdiffusion.github.io.

arxiv情報

著者	Shitao Tang,Fuyang Zhang,Jiacheng Chen,Peng Wang,Yasutaka Furukawa
発行日	2023-07-03 15:19:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー