A Convergence Theory for Diffusion Language Models: An Information-Theoretic Perspective

要約

拡散モデルは、最新の生成モデリングの強力なパラダイムとして浮上しており、大規模な言語モデル（LLM）の強力な可能性を示しています。
トークンを順次生成する従来の自己回帰（AR）モデルとは異なり、拡散モデルは並列トークンサンプリングを有効にし、生成を速くし、左から右への生成の制約を排除します。
彼らの経験的成功にもかかわらず、拡散モデルアプローチの理論的理解は未発達のままです。
この作業では、情報理論的な観点から拡散言語モデルの収束保証を開発します。
私たちの分析は、Kullback-Leibler（KL）の発散によって測定されたサンプリングエラーが、反復数$ t $と逆に減衰し、ターゲットテキストシーケンスのトークン間の相互情報と直線的に縮小することを示しています。
特に、収束分析の圧迫感を実証するために、一定の要因までの一致する上限と下限を確立します。
これらの結果は、拡散言語モデルの実用的な有効性に関する新しい理論的洞察を提供します。

要約(オリジナル)

Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially, diffusion models enable parallel token sampling, leading to faster generation and eliminating left-to-right generation constraints. Despite their empirical success, the theoretical understanding of diffusion model approaches remains underdeveloped. In this work, we develop convergence guarantees for diffusion language models from an information-theoretic perspective. Our analysis demonstrates that the sampling error, measured by the Kullback-Leibler (KL) divergence, decays inversely with the number of iterations $T$ and scales linearly with the mutual information between tokens in the target text sequence. In particular, we establish matching upper and lower bounds, up to some constant factor, to demonstrate the tightness of our convergence analysis. These results offer novel theoretical insights into the practical effectiveness of diffusion language models.

arxiv情報

著者	Gen Li,Changxiao Cai
発行日	2025-05-27 16:24:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Convergence Theory for Diffusion Language Models: An Information-Theoretic Perspective

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー