Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

要約

潜在拡散モデル (LDM) は、高解像度画像の合成において顕著な成果を上げています。
ただし、反復サンプリングプロセスは計算負荷が高く、生成が遅くなります。
一貫性モデル (song et al.) からインスピレーションを得て、私たちは潜在一貫性モデル (LCM) を提案します。これにより、安定拡散 (rombach et al.) を含む、事前にトレーニングされた LDM 上で最小限のステップで迅速な推論が可能になります。
誘導逆拡散プロセスを拡張確率フロー ODE (PF-ODE) を解くものとみなして、LCM は潜在空間におけるそのような ODE の解を直接予測するように設計されており、多数の反復の必要性が軽減され、迅速で忠実度の高いサンプリングが可能になります。
事前トレーニングされた分類器を使用しないガイド付き拡散モデルから効率的に抽出された、高品質の 768 x 768 2 ～ 4 ステップ LCM は、トレーニングにわずか 32 A100 GPU 時間しかかかりません。
さらに、カスタマイズされた画像データセットで LCM を微調整するために調整された新しい方法である潜在整合性微調整 (LCF) を紹介します。
LAION-5B-Aesthetics データセットの評価では、LCM が数ステップの推論で最先端のテキストから画像への生成パフォーマンスを達成していることが実証されています。
プロジェクトページ: https://latent-consistency-models.github.io/

要約(オリジナル)

Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: https://latent-consistency-models.github.io/

arxiv情報

著者	Simian Luo,Yiqin Tan,Longbo Huang,Jian Li,Hang Zhao
発行日	2023-10-06 17:11:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー