DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

要約

画像生成の分野では、追加のテキスト指示を使用して視覚的なプロンプトからカスタマイズされた画像を作成することが、有望な取り組みとして浮上しています。
しかし、既存の方法は、チューニングベースでもチューニングフリーでも、視覚的なプロンプトから被験者の必須属性を解釈するのに苦労しています。
これにより、主題とは無関係な属性が生成プロセスに侵入し、最終的には編集可能性と ID 保存の両方においてパーソナライゼーションの品質が損なわれます。
この論文では、無関係な情報をフィルタリングしながら被写体の本質的な特徴を効果的に抽出して強化する新しいアプローチである DisEnvisioner を紹介します。これにより、チューニング不要の方法で、単一の画像のみを使用して、優れたカスタマイズパフォーマンスが可能になります。
具体的には、被写体の特徴とその他の無関係なコンポーネントが効果的に独特の視覚的トークンに分離され、より正確なカスタマイズが可能になります。
ID の一貫性をさらに向上させることを目的として、もつれを解いた特徴を強化し、より粒度の細かい表現に彫刻します。
実験では、命令応答 (編集可能性)、ID の一貫性、推論速度、および全体的な画質において、既存の方法よりも当社のアプローチが優れていることが実証され、DisEnvisioner の有効性と効率性が強調されています。
プロジェクトページ: https://disenvisioner.github.io/。

要約(オリジナル)

In the realm of image generation, creating customized images from visual prompt with additional textual instruction emerges as a promising endeavor. However, existing methods, both tuning-based and tuning-free, struggle with interpreting the subject-essential attributes from the visual prompt. This leads to subject-irrelevant attributes infiltrating the generation process, ultimately compromising the personalization quality in both editability and ID preservation. In this paper, we present DisEnvisioner, a novel approach for effectively extracting and enriching the subject-essential features while filtering out -irrelevant information, enabling exceptional customization performance, in a tuning-free manner and using only a single image. Specifically, the feature of the subject and other irrelevant components are effectively separated into distinctive visual tokens, enabling a much more accurate customization. Aiming to further improving the ID consistency, we enrich the disentangled features, sculpting them into more granular representations. Experiments demonstrate the superiority of our approach over existing methods in instruction response (editability), ID consistency, inference speed, and the overall image quality, highlighting the effectiveness and efficiency of DisEnvisioner. Project page: https://disenvisioner.github.io/.

arxiv情報

著者	Jing He,Haodong Li,Yongzhe Hu,Guibao Shen,Yingjie Cai,Weichao Qiu,Ying-Cong Chen
発行日	2024-10-28 15:35:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー