DiTFastAttn: Attention Compression for Diffusion Transformer Models

要約

拡散トランスフォーマー (DiT) は画像とビデオの生成に優れていますが、自己注意演算子の 2 次の複雑さによる計算上の課題に直面しています。
我々は、DiT の計算ボトルネックを軽減するためのトレーニング後の圧縮方法である DiTFastAttn を提案します。
DiT 推論中のアテンション計算における 3 つの重要な冗長性を特定します。(1) 空間的冗長性。多くのアテンションヘッドがローカル情報に焦点を当てます。
(2) 時間的冗長性。隣接するステップの注意出力間の類似性が高い。
(3) 条件付き冗長性。条件付き推論と無条件推論が顕著な類似性を示します。
我々は、これらの冗長性を削減するための 3 つの手法を提案します。 (1) 空間的冗長性を削減するための残差共有によるウィンドウアテンション。
(2) ステップ間の類似性を利用するためのタイムステップ間の注意共有。
(3) 条件付き生成中に冗長な計算をスキップするための CFG 全体でのアテンション共有。
DiT には DiTFastAttn を、画像生成タスクには PixArt-Sigma を、ビデオ生成タスクには OpenSora を適用します。
私たちの結果は、画像生成に関して、私たちの方法がアテンション FLOP を最大 76% 削減し、高解像度 (2k x 2k) 生成で最大 1.8 倍のエンドツーエンドの高速化を達成することを示しています。

要約(オリジナル)

Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to the quadratic complexity of self-attention operators. We propose DiTFastAttn, a post-training compression method to alleviate the computational bottleneck of DiT. We identify three key redundancies in the attention computation during DiT inference: (1) spatial redundancy, where many attention heads focus on local information; (2) temporal redundancy, with high similarity between the attention outputs of neighboring steps; (3) conditional redundancy, where conditional and unconditional inferences exhibit significant similarity. We propose three techniques to reduce these redundancies: (1) Window Attention with Residual Sharing to reduce spatial redundancy; (2) Attention Sharing across Timesteps to exploit the similarity between steps; (3) Attention Sharing across CFG to skip redundant computations during conditional generation. We apply DiTFastAttn to DiT, PixArt-Sigma for image generation tasks, and OpenSora for video generation tasks. Our results show that for image generation, our method reduces up to 76% of the attention FLOPs and achieves up to 1.8x end-to-end speedup at high-resolution (2k x 2k) generation.

arxiv情報

著者	Zhihang Yuan,Hanling Zhang,Pu Lu,Xuefei Ning,Linfeng Zhang,Tianchen Zhao,Shengen Yan,Guohao Dai,Yu Wang
発行日	2024-10-18 12:05:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DiTFastAttn: Attention Compression for Diffusion Transformer Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー