Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

要約

反りと侵入の方法論を介して、整列した新規ビューイメージとジオメトリ生成を実行する拡散ベースのフレームワークを紹介します。
密なポーズ画像またはドメイン内ビューに限定されたポーズ埋め込まれた生成モデルを必要とする以前の方法とは異なり、私たちの方法は、既製のジオメトリ予測因子を参照画像から見ている部分的なジオメトリを予測し、画像とジオメトリの両方のパインティングタスクとして新規ビュー合成を定式化します。
生成された画像とジオメトリ間の正確なアライメントを確保するために、トレーニングと推論の両方で画像拡散分岐からの注意マップが並列ジオメトリ拡散分岐に注入されるクロスモーダルの注意蒸留を提案します。
このマルチタスクアプローチは、相乗効果を達成し、幾何学的に堅牢な画像合成と明確に定義されたジオメトリ予測を促進します。
さらに、近接ベースのメッシュコンディショニングを導入して深さと通常のキューを統合し、ポイントクラウドとフィルタリングを補間し、生成プロセスに影響を与えることから誤って予測されたジオメトリを誤って予測します。
経験的には、私たちの方法は、さまざまな目に見えないシーンにわたって画像とジオメトリの両方で高忠実度の外挿ビュー合成を達成し、補間設定の下で競争力のある再構成品質を提供し、包括的な3D完了のために幾何学的に整列した色の雲を生成します。
プロジェクトページは、https：//cvlab-kaist.github.io/moaiで入手できます。

要約(オリジナル)

We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology. Unlike prior methods that require dense posed images or pose-embedded generative models limited to in-domain views, our method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images, and formulates novel-view synthesis as an inpainting task for both image and geometry. To ensure accurate alignment between generated images and geometry, we propose cross-modal attention distillation, where attention maps from the image diffusion branch are injected into a parallel geometry diffusion branch during both training and inference. This multi-task approach achieves synergistic effects, facilitating geometrically robust image synthesis as well as well-defined geometry prediction. We further introduce proximity-based mesh conditioning to integrate depth and normal cues, interpolating between point cloud and filtering erroneously predicted geometry from influencing the generation process. Empirically, our method achieves high-fidelity extrapolative view synthesis on both image and geometry across a range of unseen scenes, delivers competitive reconstruction quality under interpolation settings, and produces geometrically aligned colored point clouds for comprehensive 3D completion. Project page is available at https://cvlab-kaist.github.io/MoAI.

arxiv情報

著者	Min-Seop Kwak,Junho Kim,Sangdoo Yun,Dongyoon Han,Taekyoung Kim,Seungryong Kim,Jin-Hwa Kim
発行日	2025-06-13 16:19:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー