Multi-Concept Customization of Text-to-Image Diffusion

要約

生成モデルは大規模なデータベースから学習したコンセプトの高品質なイメージを生成しますが、ユーザーは自分のコンセプト (家族、ペット、アイテムなど) のインスタンス化を合成したいと考えることがよくあります。
いくつかの例が与えられた場合、モデルに新しい概念をすばやく習得するように教えることができますか?
さらに、複数の新しい概念を組み合わせることはできますか?
既存のテキストから画像へのモデルを拡張するための効率的な方法である Custom Diffusion を提案します。
テキストから画像への調整メカニズムでいくつかのパラメーターを最適化するだけで、新しい概念を表現するのに十分強力であり、高速な調整が可能であることがわかりました (約 6 分)。
さらに、複数のコンセプトを共同でトレーニングしたり、複数の微調整されたモデルを閉じた形式の制約付き最適化によって 1 つに結合したりできます。
私たちの微調整されたモデルは、複数の新しいコンセプトのバリエーションを生成し、それらを新しい設定で既存のコンセプトとシームレスに構成します。
私たちの方法は、定性評価と定量評価の両方に関して、いくつかのベースラインと同時作業よりも優れており、メモリと計算効率が優れています。

要約(オリジナル)

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple, new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms several baselines and concurrent works, regarding both qualitative and quantitative evaluations, while being memory and computationally efficient.

arxiv情報

著者	Nupur Kumari,Bingliang Zhang,Richard Zhang,Eli Shechtman,Jun-Yan Zhu
発行日	2022-12-08 18:57:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Concept Customization of Text-to-Image Diffusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー