Accelerating Vision Diffusion Transformers with Skip Branches

要約

新しい画像およびビデオ生成モデルアーキテクチャである拡散トランスフォーマー (DiT) は、その高い生成品質とスケーラビリティ特性により、大きな可能性を実証しています。
素晴らしいパフォーマンスにもかかわらず、その実際の展開は、逐次ノイズ除去プロセスにおける計算の複雑さと冗長性によって制限されます。
タイムステップにわたる特徴キャッシュは拡散モデルの高速化に効果的であることが証明されていますが、DiT への適用は、U-Net ベースのアプローチとの基本的なアーキテクチャの違いによって制限されます。
DiT 機能のダイナミクスの経験的分析を通じて、DiT ブロック間の機能の大きな変動が機能の再利用性にとって重要な課題となっていることがわかりました。
これに対処するために、標準の DiT をスキップ分岐を備えた Skip-DiT に変換し、機能の滑らかさを強化します。
さらに、スキップ分岐を利用して推論時にタイムステップ全体で DiT 特徴をキャッシュする Skip-Cache を導入します。
私たちは、ビデオと画像の生成のためのさまざまな DiT バックボーンでの提案の有効性を検証し、生成の品質を維持し、高速化を達成するのに役立つスキップブランチを紹介しました。
実験結果によると、Skip-DiT はほぼ無料で 1.5 倍の高速化を達成し、定量的メトリクスをわずかに削減するだけで 2.2 倍の高速化を実現します。
コードは https://github.com/OpenSparseLLMs/Skip-DiT.git で入手できます。

要約(オリジナル)

Diffusion Transformers (DiT), an emerging image and video generation model architecture, has demonstrated great potential because of its high generation quality and scalability properties. Despite the impressive performance, its practical deployment is constrained by computational complexity and redundancy in the sequential denoising process. While feature caching across timesteps has proven effective in accelerating diffusion models, its application to DiT is limited by fundamental architectural differences from U-Net-based approaches. Through empirical analysis of DiT feature dynamics, we identify that significant feature variation between DiT blocks presents a key challenge for feature reusability. To address this, we convert standard DiT into Skip-DiT with skip branches to enhance feature smoothness. Further, we introduce Skip-Cache which utilizes the skip branches to cache DiT features across timesteps at the inference time. We validated effectiveness of our proposal on different DiT backbones for video and image generation, showcasing skip branches to help preserve generation quality and achieve higher speedup. Experimental results indicate that Skip-DiT achieves a 1.5x speedup almost for free and a 2.2x speedup with only a minor reduction in quantitative metrics. Code is available at https://github.com/OpenSparseLLMs/Skip-DiT.git.

arxiv情報

著者	Guanjie Chen,Xinyu Zhao,Yucheng Zhou,Tianlong Chen,Cheng Yu
発行日	2024-11-26 17:28:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Accelerating Vision Diffusion Transformers with Skip Branches

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー