Scaling Diffusion Language Models via Adaptation from Autoregressive Models

要約

拡散言語モデル (DLM) は、テキスト生成モデリングの有望な新しいパラダイムとして台頭しており、自己回帰 (AR) モデルの限界に対処できる可能性があります。
ただし、現在の DLM は、対応する AR に比べて小規模で研究されており、言語モデリングベンチマークでの公正な比較が行われていません。
さらに、拡散モデルをゼロから大規模にトレーニングすることは依然として困難です。
オープンソース AR 言語モデルの普及を考慮して、これらのモデルをテキスト拡散モデルの構築に適応させることを提案します。
AR と拡散モデリングの目的の間の関係を示し、拡散モデルをトレーニングするためのシンプルな継続的な事前トレーニングアプローチを紹介します。
言語モデリング、推論、および常識ベンチマークの体系的な評価を通じて、トレーニングに 200B 未満のトークンを使用して、127M から 7B パラメーター (GPT2 および LLaMA) の範囲の AR モデルを拡散モデル DiffuGPT および DiffuLLaMA に変換できることを示します。
私たちの実験結果では、これらのモデルが以前の DLM よりも優れており、AR モデルと競合できることが明らかになりました。
私たちは、流暢なテキストの生成、コンテキスト内学習の実行、プロンプトの並べ替えなしで中間の入力、および指示に従うことができる DLM スイート (127M、355M、および 7B パラメーターを含む) をリリースします。
com/HKUNLP/DiffuLLaMA}。

要約(オリジナル)

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR language models, we propose adapting these models to build text diffusion models. We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. Through systematic evaluation on language modeling, reasoning, and commonsense benchmarks, we show that we can convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts. We release a suite of DLMs (with 127M, 355M, and 7B parameters) capable of generating fluent text, performing in-context learning, filling in the middle without prompt re-ordering, and following instructions \url{https://github.com/HKUNLP/DiffuLLaMA}.

arxiv情報

著者	Shansan Gong,Shivam Agarwal,Yizhe Zhang,Jiacheng Ye,Lin Zheng,Mukai Li,Chenxin An,Peilin Zhao,Wei Bi,Jiawei Han,Hao Peng,Lingpeng Kong
発行日	2024-10-23 14:04:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー