Adaptive Caching for Faster Video Generation with Diffusion Transformers

要約

時間的に一貫した高忠実度ビデオを生成すると、特に長い時間スパンでは、計算コストが高くなる可能性があります。
最近の拡散トランスフォーマー (DiT) は、この文脈では大幅に進歩しているにもかかわらず、大規模なモデルとより重いアテンションメカニズムに依存するため、そのような課題が増大するだけであり、結果として推論速度が遅くなります。
このペーパーでは、アダプティブキャッシング (AdaCache) と呼ばれる、ビデオ DiT を高速化するためのトレーニング不要の方法を紹介します。これは、「すべてのビデオが同じように作成されるわけではない」という事実によって動機づけられています。つまり、一部のビデオでは、
他よりもリーズナブルな品質。
これに基づいて、拡散プロセスを通じて計算をキャッシュするだけでなく、各ビデオ世代に合わせたキャッシュスケジュールを考案し、品質と遅延のトレードオフを最大化します。
さらに、AdaCache 内のビデオ情報を利用するために MoReg (Motion Regularization) スキームを導入し、基本的にモーションコンテンツに基づいて計算割り当てを制御します。
全体として、当社のプラグアンドプレイの貢献により、複数のビデオ DiT ベースラインにわたって、生成品質を犠牲にすることなく、大幅な推論の高速化 (たとえば、Open-Sora 720p – 2 秒のビデオ生成で最大 4.7 倍) が実現します。

要約(オリジナル)

Generating temporally-consistent high-fidelity videos can be computationally expensive, especially over longer temporal spans. More-recent Diffusion Transformers (DiTs) — despite making significant headway in this context — have only heightened such challenges as they rely on larger models and heavier attention mechanisms, resulting in slower inference speeds. In this paper, we introduce a training-free method to accelerate video DiTs, termed Adaptive Caching (AdaCache), which is motivated by the fact that ‘not all videos are created equal’: meaning, some videos require fewer denoising steps to attain a reasonable quality than others. Building on this, we not only cache computations through the diffusion process, but also devise a caching schedule tailored to each video generation, maximizing the quality-latency trade-off. We further introduce a Motion Regularization (MoReg) scheme to utilize video information within AdaCache, essentially controlling the compute allocation based on motion content. Altogether, our plug-and-play contributions grant significant inference speedups (e.g. up to 4.7x on Open-Sora 720p – 2s video generation) without sacrificing the generation quality, across multiple video DiT baselines.

arxiv情報

著者	Kumara Kahatapitiya,Haozhe Liu,Sen He,Ding Liu,Menglin Jia,Chenyang Zhang,Michael S. Ryoo,Tian Xie
発行日	2024-11-07 17:06:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adaptive Caching for Faster Video Generation with Diffusion Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー