Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles

要約

拡散ベースの言語モデル（DLLM）は、並列トークンの生成を有効にし、推論潜時を大幅に削減することにより、従来の自己回帰LLMの有望な代替として浮上しています。
ただし、信頼性や半自動性のデコードなどのDLLMの既存のサンプリング戦略は、しばしば静的な動作に悩まされ、最適ではない効率と柔軟性が限られています。
このホワイトペーパーでは、探索的デコード段階と加速デコード段階を適応的に交代する新しい動的サンプリング戦略であるSlowfastサンプリングを提案します。
私たちの方法は、3つの黄金の原則に導かれます。確実性の原則、収束原理、およびポジショナル原則は、いつ、どこでトークンを自信を持って効率的に解読できるかを支配します。
さらに、戦略をDLLM-Cacheと統合して、冗長計算を削減します。
ベンチマークとモデル全体の広範な実験では、スローファーストサンプリングが最小限の精度低下で最大15.63 $ \ Times $ speedupを達成し、キャッシュと組み合わせた場合は最大34.22 $ \ Times $を達成することが示されています。
特に、私たちのアプローチは、スループットのLLAMA3 8Bのような強力な自己回帰ベースラインよりも優れており、適切に設計されたサンプリングが高速および高品質の生成のDLLMの最大限の可能性を解き放つことができることを示しています。

要約(オリジナル)

Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility. In this paper, we propose SlowFast Sampling, a novel dynamic sampling strategy that adaptively alternates between exploratory and accelerated decoding stages. Our method is guided by three golden principles: certainty principle, convergence principle, and positional principle, which govern when and where tokens can be confidently and efficiently decoded. We further integrate our strategy with dLLM-Cache to reduce redundant computation. Extensive experiments across benchmarks and models show that SlowFast Sampling achieves up to 15.63$\times$ speedup on LLaDA with minimal accuracy drop, and up to 34.22$\times$ when combined with caching. Notably, our approach outperforms strong autoregressive baselines like LLaMA3 8B in throughput, demonstrating that well-designed sampling can unlock the full potential of dLLMs for fast and high-quality generation.

arxiv情報

著者	Qingyan Wei,Yaojie Zhang,Zhiyuan Liu,Dongrui Liu,Linfeng Zhang
発行日	2025-06-12 16:08:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー