Towards Realistic 3D Embedding via View Alignment

要約

近年のGAN（Generative Adversarial Network）の発展により、背景画像に興味ある前景オブジェクトを自動的に埋め込んで新たな画像を生成する自動画像合成に大きな成果を上げている。しかし、3次元画像に埋め込まれた前景オブジェクトは、360度全方位から見ることができるため、より柔軟な表現が可能である。本論文では、2次元の背景画像に3次元モデルを埋め込んで新しい画像を構成する革新的なビューアライメントGAN（VA-GAN）を提案します。VA-GANはテクスチャ生成器と差分識別器から構成され、これらは相互に接続され、エンドツーエンドで学習可能である。差分識別器は、背景画像から幾何学的な変換を学習し、合成された3Dモデルを現実的なポーズとビューで背景画像に配置できるようにガイドする。また、テクスチャ生成器は、新しいビューエンコーディング機構を採用し、推定されたビューの下で3Dモデル用の正確なオブジェクトテクスチャを生成する。2つの合成タスク（KITTIによる自動車合成とCityscapesによる歩行者合成）に対する広範な実験により、VA-GANは最先端の生成手法と比較して、質的および量的に高忠実度の合成を達成することが示された。

要約(オリジナル)

Recent advances in generative adversarial networks (GANs) have achieved great success in automated image composition that generates new images by embedding interested foreground objects into background images automatically. On the other hand, most existing works deal with foreground objects in two-dimensional (2D) images though foreground objects in three-dimensional (3D) models are more flexible with 360-degree view freedom. This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically. VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable. The differential discriminator guides to learn geometric transformation from background images so that the composed 3D models can be aligned with the background images with realistic poses and views. The texture generator adopts a novel view encoding mechanism for generating accurate object textures for the 3D models under the estimated views. Extensive experiments over two synthesis tasks (car synthesis with KITTI and pedestrian synthesis with Cityscapes) show that VA-GAN achieves high-fidelity composition qualitatively and quantitatively as compared with state-of-the-art generation methods.

arxiv情報

著者	Changgong Zhang,Fangneng Zhan,Shijian Lu,Feiying Ma,Xuansong Xie
発行日	2022-10-03 17:09:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Towards Realistic 3D Embedding via View Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー