HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

要約

拡散トランス (DiT) は、U-Net を使用する従来の拡散モデルを上回る優れたビジュアル生成機能により、最近、産業界と学術分野の両方で大きな注目を集めています。
ただし、DiT のパフォーマンスの向上には、パラメータ数と実装コストが高くつくため、携帯電話などのリソースが限られたデバイスでの使用が大幅に制限されます。
これらの課題に対処するために、DiT 推論の重みとアクティベーションの両方で 4 ビット浮動小数点 (FP) 精度を利用する効率的なトレーニング後の量子化手法である、DiT 用ハイブリッド浮動小数点量子化 (HQ-DiT) を導入します。
固定小数点量子化 (INT8 など) と比較して、私たちが提案するクリッピング範囲選択メカニズムによって補完された FP 量子化は、DiT 内のデータ分布と自然に一致し、その結果、量子化誤差が最小限に抑えられます。
さらに、HQ-DiT は、外れ値によって引き起こされる重大な量子化誤差を軽減するために、ユニバーサル恒等数学変換も実装します。
実験結果は、DiT がパフォーマンスにほとんど影響を与えずに、非常に低精度の量子化 (つまり 4 ビット) を達成できることを示しています。
私たちのアプローチは、DiT の重みとアクティベーションの両方がわずか 4 ビットに量子化され、ImageNet 上の sFID が 0.12 増加するだけの最初の例です。

要約(オリジナル)

Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones. To address these challenges, we introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference. Compared to fixed-point quantization (e.g., INT8), FP quantization, complemented by our proposed clipping range selection mechanism, naturally aligns with the data distribution within DiT, resulting in a minimal quantization error. Furthermore, HQ-DiT also implements a universal identity mathematical transform to mitigate the serious quantization error caused by the outliers. The experimental results demonstrate that DiT can achieve extremely low-precision quantization (i.e., 4 bits) with negligible impact on performance. Our approach marks the first instance where both weights and activations in DiTs are quantized to just 4 bits, with only a 0.12 increase in sFID on ImageNet.

arxiv情報

著者	Wenxuan Liu,Sai Qian Zhang
発行日	2024-05-31 15:48:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー