Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

要約

3D 写真は、静止画像を魅力的な 3D 視覚効果でビデオにレンダリングします。
既存のアプローチは通常、最初に単眼深度推定を実行し、次に入力フレームをさまざまな視点で後続のフレームにレンダリングし、最後に修復モデルを使用してそれらの欠落/閉塞領域を埋めます。
修復モデルは、レンダリングの品質において重要な役割を果たしますが、通常はドメイン外のデータでトレーニングされます。
トレーニングと推論のギャップを減らすために、修復モジュールとして新しい自己教師付き拡散モデルを提案します。
単一の入力画像が与えられると、マスクされたオクルージョン画像とグラウンドトゥルース画像のトレーニングペアがランダムサイクルレンダリングで自動的に構築されます。
構築されたトレーニングサンプルは、データアノテーションを必要とせずに、テストインスタンスと密接に連携しています。
マスクされた画像を最大限に活用するために、UNetに簡単にプラグインしてセマンティック条件を強化できるMasked Enhanced Block（MEB）を設計します。
現実世界のアニメーションに向けて、入力オブジェクトの空間と時間を拡張するアウトアニメーションという新しいタスクを提示します。
実際のデータセットでの広範な実験は、私たちの方法が既存の SOTA 方法と競合する結果を達成することを示しています。

要約(オリジナル)

3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints, and finally use an inpainting model to fill those missing/occluded regions. The inpainting model plays a crucial role in rendering quality, but it is normally trained on out-of-domain data. To reduce the training and inference gap, we propose a novel self-supervised diffusion model as the inpainting module. Given a single input image, we automatically construct a training pair of the masked occluded image and the ground-truth image with random cycle-rendering. The constructed training samples are closely aligned to the testing instances, without the need of data annotation. To make full use of the masked images, we design a Masked Enhanced Block (MEB), which can be easily plugged into the UNet and enhance the semantic conditions. Towards real-world animation, we present a novel task: out-animation, which extends the space and time of input objects. Extensive experiments on real datasets show that our method achieves competitive results with existing SOTA methods.

arxiv情報

著者	Xiaodong Wang,Chenfei Wu,Shengming Yin,Minheng Ni,Jianfeng Wang,Linjie Li,Zhengyuan Yang,Fan Yang,Lijuan Wang,Zicheng Liu,Yuejian Fang,Nan Duan
発行日	2023-02-21 16:18:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー