Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

要約

拡散トランス（DIT）は、画像とビデオ生成の強力なアーキテクチャとして浮上しており、優れた品質とスケーラビリティを提供しています。
ただし、それらの実際のアプリケーションは固有の動的特徴の不安定性に苦しんでおり、キャッシュされた推論中のエラー増幅につながります。
体系的な分析を通じて、不安定な特徴の伝播と摂動感度の根本原因として、長距離特徴保存メカニズムがないことを特定します。
この目的のために、U-Netsの主要な効率コンポーネントであるLongSkip接続（LSC）で強化された新しいDITバリアントであるSkip-Ditを提案します。
理論的スペクトル規範と視覚化分析は、LSCが機能のダイナミクスを安定化する方法を示しています。
スキップディットアーキテクチャとその安定した動的機能により、浅いコンポーネントを更新しながら、タイムステップ全体で深い特徴を再利用する効率的な静的キャッシングメカニズムが可能になります。
画像とビデオの生成タスクをめぐる広範な実験は、スキップディットが達成することを示しています。（1）トレーニングの加速とより速い収束、（2）1.5-2倍の推論の加速は、品質損失と元の出力に対する高忠実度なしで、さまざまな定量的メトリックにわたって既存のDITキャッシング方法を上回ります。
私たちの調査結果は、安定した効率的な拡散変圧器をトレーニングするための重要なアーキテクチャコンポーネントとして長SKIP接続を確立します。

要約(オリジナル)

Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability. However, their practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference. Through systematic analysis, we identify the absence of long-range feature preservation mechanisms as the root cause of unstable feature propagation and perturbation sensitivity. To this end, we propose Skip-DiT, a novel DiT variant enhanced with Long-Skip-Connections (LSCs) – the key efficiency component in U-Nets. Theoretical spectral norm and visualization analysis demonstrate how LSCs stabilize feature dynamics. Skip-DiT architecture and its stabilized dynamic feature enable an efficient statical caching mechanism that reuses deep features across timesteps while updating shallow components. Extensive experiments across image and video generation tasks demonstrate that Skip-DiT achieves: (1) 4.4 times training acceleration and faster convergence, (2) 1.5-2 times inference acceleration without quality loss and high fidelity to original output, outperforming existing DiT caching methods across various quantitative metrics. Our findings establish long-skip connections as critical architectural components for training stable and efficient diffusion transformers.

arxiv情報

著者	Guanjie Chen,Xinyu Zhao,Yucheng Zhou,Xiaoye Qu,Tianlong Chen,Yu Cheng
発行日	2025-03-28 16:15:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー