TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

要約

テキスト駆動の拡散モデルは優れた生成機能を発揮し、さまざまな画像編集タスクを可能にします。
この論文では、クロスドメインの画像誘導合成にテキスト駆動拡散モデルの力を活用する、新しいトレーニング不要の画像合成フレームワークである TF-ICON を提案します。
このタスクは、ユーザーが提供したオブジェクトを特定のビジュアルコンテキストにシームレスに統合することを目的としています。
現在の拡散ベースの手法には、コストのかかるインスタンスベースの最適化や、カスタマイズされたデータセットでの事前トレーニング済みモデルの微調整が含まれることが多く、これにより、その豊富な以前の手法が損なわれる可能性があります。
対照的に、TF-ICON は既製の拡散モデルを活用して、追加のトレーニング、微調整、または最適化を必要とせずに、クロスドメインの画像誘導合成を実行できます。
さらに、実際の画像を潜在的な表現に正確に反転して合成の基礎を形成するテキスト駆動型拡散モデルを容易にするために、情報を含まない例外的なプロンプトを導入します。
私たちの実験では、Stable Diffusion に優れたプロンプトを装備すると、さまざまなデータセット (CelebA-HQ、COCO、ImageNet) で最先端の反転手法よりも優れたパフォーマンスを発揮し、TF-ICON が多用途の視覚領域で以前のベースラインを上回ることが示されました。
コードは https://github.com/Shilin-LU/TF-ICON で入手できます。

要約(オリジナル)

Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. Code is available at https://github.com/Shilin-LU/TF-ICON

arxiv情報

著者	Shilin Lu,Yanzhu Liu,Adams Wai-Kin Kong
発行日	2023-07-25 15:17:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー