Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution

要約

拡散モデルは、多くの生成モデリングタスクでは画期的なパフォーマンスを発揮しますが、自然言語などの離散データドメインでは不十分です。
重要なことは、標準拡散モデルは十分に確立されたスコアマッチング理論に依存していることですが、これを離散構造に一般化する取り組みでは、同様の経験的成果は得られていません。
この研究では、既存の方法よりも安定し、最尤トレーニング用の ELBO を形成し、ノイズ除去バリアントで効率的に最適化できる新しい離散スコアマッチング損失であるスコアエントロピーを提案することで、このギャップを埋めます。
当社では、スコアエントロピー離散拡散モデル (SEDD) を GPT-2 の実験設定に合わせて拡張し、アルゴリズム上の明確な利点を導入しながら、高い競争力を実現します。
特に、同様のサイズの SEDD モデルと GPT-2 モデルを比較すると、SEDD は同等の複雑度 (通常はベースラインの $+10\%$ 以内、場合によってはベースラインを上回るパフォーマンス) を達成します。
さらに、SEDD モデルはより忠実な配列分布を学習し (大規模モデルで測定された祖先サンプリングを使用する GPT-2 モデルと比較して約 $4\times$ 優れています)、世代品質とコンピューティングをトレードオフできます (必要なネットワーク評価は $16\times$ だけ必要です)
GPT-2 に一致するように)、標準の左から右へのプロンプトを超えた任意の入力が可能になります。

要約(オリジナル)

Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel discrete score matching loss that is more stable than existing methods, forms an ELBO for maximum likelihood training, and can be efficiently optimized with a denoising variant. We scale our Score Entropy Discrete Diffusion models (SEDD) to the experimental setting of GPT-2, achieving highly competitive likelihoods while also introducing distinct algorithmic advantages. In particular, when comparing similarly sized SEDD and GPT-2 models, SEDD attains comparable perplexities (normally within $+10\%$ of and sometimes outperforming the baseline). Furthermore, SEDD models learn a more faithful sequence distribution (around $4\times$ better compared to GPT-2 models with ancestral sampling as measured by large models), can trade off compute for generation quality (needing only $16\times$ fewer network evaluations to match GPT-2), and enables arbitrary infilling beyond the standard left to right prompting.

arxiv情報

著者	Aaron Lou,Chenlin Meng,Stefano Ermon
発行日	2023-10-25 17:59:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー