Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

要約

事前トレーニング済みの拡散モデルを使用して条件付き画像合成を有効にするためのマルチモーダルコンディショニングモジュール (MCM) を提示します。
以前のマルチモーダル合成作業は、ネットワークをゼロからトレーニングするか、事前トレーニング済みのネットワークを微調整することに依存しており、どちらも大規模な最先端の拡散モデルでは計算コストが高くなります。
この方法では、事前学習済みのネットワークを使用しますが、拡散ネットワークのパラメーターを更新する必要はありません。
MCM は、拡散モデルの元のトレーニング中には見られなかった 2D モダリティ (セマンティックセグメンテーションマップ、スケッチなど) を使用して、サンプリング中に拡散ネットワークの予測を調整するようにトレーニングされた小さなモジュールです。
MCM を使用すると、画像の空間レイアウトをユーザーが制御できるようになり、画像生成プロセスの制御が強化されることを示します。
トレーニング MCM は、元の拡散ネットからの勾配を必要とせず、基本拡散モデルのパラメーター数の $\sim$1$\%$ のみで構成され、限られた数のトレーニングサンプルのみを使用してトレーニングされるため、安価です。
無条件モデルとテキスト条件付きモデルでこの方法を評価し、生成された画像の制御と、条件付け入力に対するそれらの位置合わせの改善を実証します。

要約(オリジナル)

We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous multimodal synthesis works rely on training networks from scratch or fine-tuning pretrained networks, both of which are computationally expensive for large, state-of-the-art diffusion models. Our method uses pretrained networks but does not require any updates to the diffusion network’s parameters. MCM is a small module trained to modulate the diffusion network’s predictions during sampling using 2D modalities (e.g., semantic segmentation maps, sketches) that were unseen during the original training of the diffusion model. We show that MCM enables user control over the spatial layout of the image and leads to increased control over the image generation process. Training MCM is cheap as it does not require gradients from the original diffusion net, consists of only $\sim$1$\%$ of the number of parameters of the base diffusion model, and is trained using only a limited number of training examples. We evaluate our method on unconditional and text-conditional models to demonstrate the improved control over the generated images and their alignment with respect to the conditioning inputs.

arxiv情報

著者	Cusuh Ham,James Hays,Jingwan Lu,Krishna Kumar Singh,Zhifei Zhang,Tobias Hinz
発行日	2023-02-24 17:28:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー