PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

要約

拡散モデルは、高次元のコンテンツを生成する際に顕著なパフォーマンスを発揮しますが、特にトレーニング中に計算量が多くなります。
私たちは、ダウンサンプリングされたデータでの拡散のトレーニング、事前トレーニングされた拡散の蒸留、およびプログレッシブ超解像度の 3 つの段階を通じてトレーニングコストを削減する新しいパイプラインである Progressive Growing of Diffusion Autoencoder (PaGoDA) を提案します。
提案されたパイプラインを使用すると、PaGoDA は 8 倍のダウンサンプリングデータで拡散モデルをトレーニングする際に、64 倍のコスト削減を達成しました。
一方、推論ではシングルステップで、64×64 から 512×512 までのすべての解像度にわたって ImageNet 上で最先端の処理を実行し、テキストから画像に変換します。
PaGoDA のパイプラインは潜在空間に直接適用でき、潜在拡散モデル (安定拡散など) で事前にトレーニングされたオートエンコーダーと一緒に圧縮を追加できます。
コードは https://github.com/sony/pagoda で入手できます。

要約(オリジナル)

The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a $64\times$ reduced cost in training its diffusion model on 8x downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from 64×64 to 512×512, and text-to-image. PaGoDA’s pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.

arxiv情報

著者	Dongjun Kim,Chieh-Hsin Lai,Wei-Hsiang Liao,Yuhta Takida,Naoki Murata,Toshimitsu Uesaka,Yuki Mitsufuji,Stefano Ermon
発行日	2024-10-29 15:26:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー