Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

要約

Text-to-Image 拡散モデルは過去 2 年間で大幅な進歩を遂げ、オープンドメインのテキスト記述に基づいて非常にリアルな画像を生成できるようになりました。
ただし、成功にもかかわらず、テキスト説明は、長くて複雑なテキストで構成されている場合でも、詳細なコントロールを適切に伝えるのに苦労することがよくあります。
さらに、最近の研究では、これらのモデルが複雑なテキストを理解し、対応する画像を生成する際に課題に直面していることも示されています。
したがって、テキストによる説明を超えて、より多くの制御モードを有効にする必要性が高まっています。
このペーパーでは、柔軟で構成可能なさまざまなローカルコントロール (エッジマップ、深度マップ、セグメンテーションマスクなど) とグローバルコントロール (CLIP 画像埋め込みなど) を同時に使用できるようにする統合フレームワークである Uni-ControlNet を紹介します。
単一モデル内での方法。
既存の方法とは異なり、Uni-ControlNet では、凍結された事前トレーニングされたテキストから画像への拡散モデルに対して 2 つの追加アダプターを微調整するだけで済み、最初からトレーニングするための膨大なコストが不要になります。
さらに、いくつかの専用アダプター設計のおかげで、Uni-ControlNet は、使用されるローカルまたはグローバルコントロールの数に関係なく、一定数 (つまり 2 つ) のアダプターのみを必要とします。
これにより、微調整コストとモデルサイズが削減され、実際の展開により適したものになるだけでなく、さまざまな条件の構成も容易になります。
定量的および定性的な比較を通じて、Uni-ControlNet は制御性、生成品質、構成可能性の点で既存の方法よりも優れていることを実証します。
コードは \url{https://github.com/ShihaoZhaoZSH/Uni-ControlNet} で入手できます。

要約(オリジナル)

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challenges in understanding such complex texts and generating the corresponding images. Therefore, there is a growing need to enable more control modes beyond text description. In this paper, we introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one single model. Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models, eliminating the huge cost of training from scratch. Moreover, thanks to some dedicated adapter designs, Uni-ControlNet only necessitates a constant number (i.e., 2) of adapters, regardless of the number of local or global controls used. This not only reduces the fine-tuning costs and model size, making it more suitable for real-world deployment, but also facilitate composability of different conditions. Through both quantitative and qualitative comparisons, Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability. Code is available at \url{https://github.com/ShihaoZhaoZSH/Uni-ControlNet}.

arxiv情報

著者	Shihao Zhao,Dongdong Chen,Yen-Chun Chen,Jianmin Bao,Shaozhe Hao,Lu Yuan,Kwan-Yee K. Wong
発行日	2023-10-29 15:59:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー