ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

要約

最近、マルチメディアコミュニティは、特にテキストから画像への生成の分野で、ビジュアルコンテンツ作成のために大規模なマルチモーダルデータでトレーニングされた拡散モデルの台頭を目の当たりにしています。
この論文では、テキストから画像へのモデルを「様式化」するための新しいタスク、つまり、コンテンツ作成における編集可能性をさらに強化する、テキスト駆動の様式化された画像生成を提案します。
入力テキストプロンプトとスタイルイメージが与えられた場合、このタスクは入力テキストプロンプトに意味的に関連し、同時にスタイルでスタイルイメージと整合する様式化されたイメージを生成することを目的としています。
これを達成するために、テキストプロンプトとスタイル画像のより多くの条件を可能にするトレーニング可能な変調ネットワークを使用して、事前トレーニングされたテキストから画像へのモデルをアップグレードすることにより、新しい拡散モデル (ControlStyle) を提示します。
さらに、拡散スタイルとコンテンツの正則化が同時に導入され、これらの拡散事前分布を使用した変調ネットワークの学習が容易になり、高品質の様式化されたテキストから画像への生成が追求されます。
広範な実験により、テキストから画像へのモデルと従来のスタイル転送技術の単純な組み合わせを超え、より視覚的に快適で芸術的な結果を生み出す ControlStyle の有効性が実証されています。

要約(オリジナル)

Recently, the multimedia community has witnessed the rise of diffusion models trained on large-scale multi-modal data for visual content creation, particularly in the field of text-to-image generation. In this paper, we propose a new task for “stylizing” text-to-image models, namely text-driven stylized image generation, that further enhances editability in content creation. Given input text prompt and style image, this task aims to produce stylized images which are both semantically relevant to input text prompt and meanwhile aligned with the style image in style. To achieve this, we present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network enabling more conditions of text prompts and style images. Moreover, diffusion style and content regularizations are simultaneously introduced to facilitate the learning of this modulation network with these diffusion priors, pursuing high-quality stylized text-to-image generation. Extensive experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results, surpassing a simple combination of text-to-image model and conventional style transfer techniques.

arxiv情報

著者	Jingwen Chen,Yingwei Pan,Ting Yao,Tao Mei
発行日	2023-11-09 15:50:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー