PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

要約

制御可能な生成は、3D データに注釈を付けるという課題に対処するための潜在的に重要なアプローチであると考えられており、このような制御可能な生成の精度は、自動運転用のデータ生成のコンテキストにおいて特に不可欠になります。
既存の手法は、GLIGEN や ControlNet などのフレームワークを利用して、制御可能な生成で賞賛に値する結果を生み出すために、多様な生成情報を制御入力に統合することに重点を置いています。
ただし、このようなアプローチは本質的に、生成パフォーマンスを事前定義されたネットワークアーキテクチャの学習能力に制限します。
このホワイトペーパーでは、制御情報の統合を検討し、パースペクティブ 3D 幾何学情報を最大限に活用した効果的なストリートビュー画像生成方法である PerlDiff (パースペクティブレイアウト拡散モデル) を紹介します。
当社の PerlDiff は 3D 幾何学的事前分布を採用し、ネットワーク学習プロセス内で正確なオブジェクトレベルの制御でストリートビュー画像の生成をガイドし、より堅牢で制御可能な出力を実現します。
また、他のレイアウト制御方式に比べて優れた制御性を発揮します。
経験的な結果は、PerlDiff が NuScenes および KITTI データセットでの生成の精度を著しく向上させることを正当化します。
私たちのコードとモデルは、https://github.com/LabShuHangGU/PerlDiff で公開されています。

要約(オリジナル)

Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or ControlNet, to produce commendable outcomes in controllable generation. However, such approaches intrinsically restrict generation performance to the learning capacities of predefined network architectures. In this paper, we explore the integration of controlling information and introduce PerlDiff (Perspective-Layout Diffusion Models), a method for effective street view image generation that fully leverages perspective 3D geometric information. Our PerlDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process, resulting in a more robust and controllable output. Moreover, it demonstrates superior controllability compared to alternative layout control methods. Empirical results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets. Our codes and models are publicly available at https://github.com/LabShuHangGU/PerlDiff.

arxiv情報

著者	Jinhua Zhang,Hualian Sheng,Sijia Cai,Bing Deng,Qiao Liang,Wen Li,Ying Fu,Jieping Ye,Shuhang Gu
発行日	2024-07-08 16:46:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー