Controlling Text-to-Image Diffusion by Orthogonal Finetuning

要約

大規模なテキストから画像への拡散モデルは、テキストプロンプトからフォトリアリスティックな画像を生成する優れた機能を備えています。
これらの強力なモデルを効果的に誘導または制御してさまざまな下流タスクを実行する方法は、重要な未解決の問題になります。
この課題に取り組むために、テキストから画像への拡散モデルを下流のタスクに適応させるための原則に基づいた微調整手法である直交微調整 (OFT) を導入します。
既存の方法とは異なり、OFT は、単位超球上のペアごとのニューロン関係を特徴付ける超球面エネルギーを保存できることが証明されています。
この特性は、テキストから画像への拡散モデルの意味生成能力を維持するために重要であることがわかりました。
微調整の安定性を向上させるために、超球面に追加の半径制約を課す Constrained Orthogonal Finetuning (COFT) をさらに提案します。
具体的には、テキストから画像への微調整の 2 つの重要なタスクを検討します。1 つは、被写体のいくつかの画像とテキストプロンプトが与えられた場合に、被写体固有の画像を生成することを目標とする被写体駆動型の生成と、
追加の制御信号を取り込むモデル。
私たちは、OFT フレームワークが生成品質と収束速度の点で既存の方法よりも優れていることを経験的に示しています。

要約(オリジナル)

Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method — Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.

arxiv情報

著者	Zeju Qiu,Weiyang Liu,Haiwen Feng,Yuxuan Xue,Yao Feng,Zhen Liu,Dan Zhang,Adrian Weller,Bernhard Schölkopf
発行日	2023-06-12 17:59:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー