Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion

要約

安定拡散モデルのテキストから画像への生成は、その顕著な生成能力により顕著な成功を収めています。
ただし、反復的なノイズ除去プロセスは推論中に大量の計算を行うため、拡散モデルは低遅延とスケーラビリティを必要とする現実世界のアプリケーションにはあまり適していません。
最近の研究では、ポストトレーニング量子化 (PTQ) および量子化対応トレーニング (QAT) 手法を使用して拡散モデルを圧縮しています。
それにもかかわらず、先行研究では、量子化モデルによって生成された結果と浮動小数点モデルから生成された結果との間の一貫性の調査がしばしば無視されてきました。
この一貫性は、実務者の効率とシステムの安定性の両方を大幅に向上させることができるため、コンテンツ作成、デザイン、エッジ展開などの分野では非常に重要です。
量子化モデルが高品質で一貫した画像を生成することを保証するために、安定拡散モデルの効率的な量子化フレームワークを提案します。
私たちのアプローチは、トレーニングの安定性を確保するだけでなく、キャリブレーションと推論プロセスの両方の一貫性に対処するシリアルからパラレルへのキャリブレーションパイプラインを特徴としています。
このパイプラインに基づいて、混合精度の量子化戦略、マルチタイムステップのアクティベーション量子化、および時間情報の事前計算技術をさらに導入し、浮動小数点モデルと比較して高忠実度の生成を保証します。
Stable Diffusion v1-4、v2-1、および XL 1.0 を使用した広範な実験を通じて、COCO 検証データセットと Stable-Diffusion-
データセットを要求します。
W4A8 量子化設定では、私たちのアプローチにより、分布の類似性と視覚的な類似性の両方が 45% ～ 60% 向上します。

要約(オリジナル)

Text-to-image generation of Stable Diffusion models has achieved notable success due to its remarkable generation ability. However, the repetitive denoising process is computationally intensive during inference, which renders Diffusion models less suitable for real-world applications that require low latency and scalability. Recent studies have employed post-training quantization (PTQ) and quantization-aware training (QAT) methods to compress Diffusion models. Nevertheless, prior research has often neglected to examine the consistency between results generated by quantized models and those from floating-point models. This consistency is crucial in fields such as content creation, design, and edge deployment, as it can significantly enhance both efficiency and system stability for practitioners. To ensure that quantized models generate high-quality and consistent images, we propose an efficient quantization framework for Stable Diffusion models. Our approach features a Serial-to-Parallel calibration pipeline that addresses the consistency of both the calibration and inference processes, as well as ensuring training stability. Based on this pipeline, we further introduce a mix-precision quantization strategy, multi-timestep activation quantization, and time information precalculation techniques to ensure high-fidelity generation in comparison to floating-point models. Through extensive experiments with Stable Diffusion v1-4, v2-1, and XL 1.0, we have demonstrated that our method outperforms the current state-of-the-art techniques when tested on prompts from the COCO validation dataset and the Stable-Diffusion-Prompts dataset. Under W4A8 quantization settings, our approach enhances both distribution similarity and visual similarity by 45%-60%.

arxiv情報

著者	Shuaiting Li,Juncan Deng,Zeyu Wang,Hong Gu,Kedong Xu,Haibin Shen,Kejie Huang
発行日	2024-12-09 17:00:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー