Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

要約

高解像度の拡散モデルを加速するための自動エンコーダーモデルの新しいファミリであるディープ圧縮オートエンコーダー（DC-AE）を提示します。
既存の自動エンコーダーモデルは、中程度の空間圧縮比（8x）で印象的な結果を示していますが、高空間圧縮比（例：64x）の満足のいく再構築精度を維持できません。
2つの重要な手法を導入することにより、この課題に対処します。（1）残留自動エンコード。スペースからチャネルへの変換された特徴に基づいて残差を学習するモデルを設計して、高空間圧縮自動エンコーダーの最適化の難易度を軽減します。
（2）分離された高解像度の適応、高空間圧縮自動エンコーダーの一般化ペナルティを緩和するための効率的な分離3相トレーニング戦略。
これらの設計により、再構築品質を維持しながら、自動エンコーダーの空間圧縮率を最大128に改善します。
DC-AEを潜在的な拡散モデルに適用すると、精度の低下なしで大幅なスピードアップを実現します。
たとえば、Imagenet 512×512では、DC-AEは、広く使用されているSD-VAE-F8オートエンコーダーと比較して、より良いFIDを達成しながら、UVIT-HのH100 GPUで19.1倍の推論スピードアップと17.9xトレーニングスピードアップを提供します。
私たちのコードは、https：//github.com/mit-han-lab/efficientvitで入手できます。

要約(オリジナル)

We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder’s spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512×512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at https://github.com/mit-han-lab/efficientvit.

arxiv情報

著者	Junyu Chen,Han Cai,Junsong Chen,Enze Xie,Shang Yang,Haotian Tang,Muyang Li,Yao Lu,Song Han
発行日	2025-02-26 17:56:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー