HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration

要約

拡散変換器(DiT)は生成的なタスクに優れているが、推論コストが高いため、実用的な導入には課題がある。冗長な計算を保存して検索するフィーチャーキャッシングは、高速化の可能性を提供する。既存の学習ベースのキャッシングは、適応的ではあるが、前のタイムステップの影響を見落としている。また、学習と推論の間で、予測ノイズと高画質画像の目標がずれてしまうという問題もある。この2つの不一致は性能と効率の両方を損なう。このため、HarmoniCaと名付けられた新しい学習ベースのキャッシュフレームワークを用いて、学習と推論を調和させる。HarmoniCaは、まずステップワイズデノイジングトレーニング（SDT）を組み込み、ノイズ除去プロセスの連続性を確保する。さらに、画像誤差を近似する効率的なプロキシによって、画質とキャッシュ利用のバランスをとるために、画像誤差プロキシ誘導目標（IEPO）を適用する。モデル$8$、サンプラー$4$、解像度$256times256$から$2K$にわたる広範な実験により、本フレームワークの優れた性能と高速化が実証された。例えば、$40%以上の待ち時間削減（すなわち、理論上$2.07times$ の高速化）とPixArt-$α$での性能向上を達成した。驚くべきことに、我々のイメージフリーアプローチは、以前の方法と比較して、トレーニング時間を$25%$短縮する。我々のコードはhttps://github.com/ModelTC/HarmoniCa。

要約(オリジナル)

Diffusion Transformers (DiTs) excel in generative tasks but face practical deployment challenges due to high inference costs. Feature caching, which stores and retrieves redundant computations, offers the potential for acceleration. Existing learning-based caching, though adaptive, overlooks the impact of the prior timestep. It also suffers from misaligned objectives–aligned predicted noise vs. high-quality images–between training and inference. These two discrepancies compromise both performance and efficiency. To this end, we harmonize training and inference with a novel learning-based caching framework dubbed HarmoniCa. It first incorporates Step-Wise Denoising Training (SDT) to ensure the continuity of the denoising process, where prior steps can be leveraged. In addition, an Image Error Proxy-Guided Objective (IEPO) is applied to balance image quality against cache utilization through an efficient proxy to approximate the image error. Extensive experiments across $8$ models, $4$ samplers, and resolutions from $256\times256$ to $2K$ demonstrate superior performance and speedup of our framework. For instance, it achieves over $40\%$ latency reduction (i.e., $2.07\times$ theoretical speedup) and improved performance on PixArt-$\alpha$. Remarkably, our image-free approach reduces training time by $25\%$ compared with the previous method. Our code is available at https://github.com/ModelTC/HarmoniCa.

arxiv情報

著者	Yushi Huang,Zining Wang,Ruihao Gong,Jing Liu,Xinjie Zhang,Jinyang Guo,Xianglong Liu,Jun Zhang
発行日	2025-05-02 11:29:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー