Theoretical Benefit and Limitation of Diffusion Language Model

要約

拡散言語モデルは、テキスト生成の有望なアプローチとして浮上しています。
複数のトークンを各拡散ステップ中に並行してサンプリングできるため、この方法は自己回帰モデルの効率的な代替品になることを当然予想します。
ただし、その効率性の高いトレードオフはまだよく理解されていません。
この論文では、広く使用されているタイプの拡散言語モデルであるマスクされた拡散モデル（MDM）の厳密な理論分析を提示し、その有効性がターゲット評価メトリックに大きく依存することがわかります。
穏やかな条件下では、メトリックとして困惑を使用すると、MDMSがシーケンスの長さに関係なくサンプリングステップでほぼ最適な困惑を達成できることを証明し、パフォーマンスを犠牲にすることなく効率を達成できることを実証します。
ただし、シーケンスエラー率を使用する場合、これは推論チェーンなどのシーケンスの「正しさ」を理解するために重要です – 必要なサンプリングステップが、「正しい」シーケンスを取得するためにシーケンスの長さで直線的にスケーリングする必要があることを示します。
これにより、自己回帰モデルよりもMDMの効率的優位性が排除されます。
分析は、MDMの利点と制限を理解するための最初の理論的基盤を確立します。
すべての理論的発見は、実証研究によって裏付けられています。

要約(オリジナル)

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each diffusion step. However, its efficiency-accuracy trade-off is not yet well understood. In this paper, we present a rigorous theoretical analysis of a widely used type of diffusion language model, the Masked Diffusion Model (MDM), and find that its effectiveness heavily depends on the target evaluation metric. Under mild conditions, we prove that when using perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling steps regardless of sequence length, demonstrating that efficiency can be achieved without sacrificing performance. However, when using the sequence error rate–which is important for understanding the ‘correctness’ of a sequence, such as a reasoning chain–we show that the required sampling steps must scale linearly with sequence length to obtain ‘correct’ sequences, thereby eliminating MDM’s efficiency advantage over autoregressive models. Our analysis establishes the first theoretical foundation for understanding the benefits and limitations of MDMs. All theoretical findings are supported by empirical studies.

arxiv情報

著者	Guhao Feng,Yihan Geng,Jian Guan,Wei Wu,Liwei Wang,Di He
発行日	2025-02-13 18:59:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Theoretical Benefit and Limitation of Diffusion Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー