CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models

要約

カスタマイズされたオブジェクトを画像生成に組み込むと、テキストから画像への生成に魅力的な機能が提供されます。
しかし、既存の最適化ベースおよびエンコーダベースの方法は、時間のかかる最適化、不十分なアイデンティティ保持、コピー＆ペーストの蔓延などの欠点によって妨げられています。
これらの制限を克服するために、オブジェクトカスタマイズプロセスに 3D 新規ビュー合成機能を明示的に組み込む、新しいオブジェクトカスタマイズアプローチである CustomNet を導入します。
この統合により、空間的な位置関係と視点の調整が容易になり、オブジェクトの同一性を効果的に維持しながら、多様な出力が得られます。
さらに、既存の 3D 新規ビュー合成方法の制限を克服し、テキストによる説明や特定のユーザー定義画像による位置制御と柔軟な背景制御を可能にする繊細なデザインを導入します。
さらに、現実世界のオブジェクトや複雑な背景をより適切に処理できるデータセット構築パイプラインを活用します。
これらの設計を備えた私たちの方法は、テスト時の最適化を行わずにゼロショットオブジェクトのカスタマイズを容易にし、視点、位置、背景の同時制御を提供します。
その結果、当社の CustomNet はアイデンティティの保持を強化し、多様で調和のとれた出力を生成します。

要約(オリジナル)

Incorporating a customized object into image generation presents an attractive feature in text-to-image generation. However, existing optimization-based and encoder-based methods are hindered by drawbacks such as time-consuming optimization, insufficient identity preservation, and a prevalent copy-pasting effect. To overcome these limitations, we introduce CustomNet, a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process. This integration facilitates the adjustment of spatial position relationships and viewpoints, yielding diverse outputs while effectively preserving object identity. Moreover, we introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images, overcoming the limitations of existing 3D novel view synthesis methods. We further leverage a dataset construction pipeline that can better handle real-world objects and complex backgrounds. Equipped with these designs, our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background. As a result, our CustomNet ensures enhanced identity preservation and generates diverse, harmonious outputs.

arxiv情報

著者	Ziyang Yuan,Mingdeng Cao,Xintao Wang,Zhongang Qi,Chun Yuan,Ying Shan
発行日	2023-12-07 15:22:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー