Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

要約

高解像度拡散モデルを加速するための新しいオートエンコーダーモデルファミリである Deep Compression Autoencoder (DC-AE) を紹介します。
既存のオートエンコーダモデルは、中程度の空間圧縮率 (例: 8x) では優れた結果を示していますが、高い空間圧縮率 (例: 64x) では満足のいく再構成精度を維持できません。
私たちは 2 つの主要な技術を導入することでこの課題に対処します。(1) 残差自動エンコーディング。高空間圧縮オートエンコーダーの最適化の難しさを軽減するために、空間からチャネルに変換された特徴に基づいて残差を学習するようにモデルを設計します。
(2) 分離された高解像度適応。高空間圧縮オートエンコーダの汎化ペナルティを軽減するための効率的な分離された 3 フェーズトレーニング戦略。
これらの設計により、再構築の品質を維持しながら、オートエンコーダーの空間圧縮率を最大 128 まで向上させます。
当社の DC-AE を潜在拡散モデルに適用することで、精度を低下させることなく大幅な高速化を実現します。
たとえば、ImageNet 512×512 では、当社の DC-AE は、広く使用されている SD-VAE-f8 オートエンコーダと比較して、UViT-H の H100 GPU で 19.1 倍の推論速度向上と 17.9 倍のトレーニング速度向上を実現しながら、より優れた FID を実現します。
私たちのコードは https://github.com/mit-han-lab/efficientvit で入手できます。

要約(オリジナル)

We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder’s spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512×512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at https://github.com/mit-han-lab/efficientvit.

arxiv情報

著者	Junyu Chen,Han Cai,Junsong Chen,Enze Xie,Shang Yang,Haotian Tang,Muyang Li,Yao Lu,Song Han
発行日	2024-12-10 16:39:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー