Discrete Diffusion in Large Language and Multimodal Models: A Survey

要約

この作業では、離散拡散言語モデル（DLLM）と離散拡散マルチモーダル言語モデル（DMLLMS）の体系的な調査を提供します。
AutoreGressive（AR）モデルとは異なり、DLLMSおよびDMLLMSは、完全な注意と除去ベースの世代戦略を使用して、マルチトークンの並列解読パラダイムを採用しています。
このパラダイムは、当然、平行した生成、微調整された出力制御可能性、および動的な応答認識の認識を可能にします。
これらの機能は、以前はARモデルで達成することが困難です。
最近、産業規模の独自のD（M）LLMの数が増えているだけでなく、多数のオープンソースアカデミックD（M）LLMSが、推論速度で最大10倍の加速を達成しながら、自己回帰のカウンターパートに匹敵するパフォーマンスを実証しました。
離散拡散LLMとMLLMの進歩は、主に2つのドメインの進歩によって推進されています。
1つ目は、トレーニングと推論のための膨大な量のデータ、ベンチマーク、および基礎インフラストラクチャを蓄積した自己回帰LLMとMLLMの開発です。
2番目の寄与ドメインは、離散拡散の根底にある数学モデルの進化です。
一緒に、これらの進歩は2025年初頭にDLLMSおよびDMLLMS研究の急増を触媒しました。この作業では、DLLMおよびDMLLMドメインの研究の包括的な概要を示します。
DLLMSとDMLLMSの歴史的発展を追跡し、基礎となる数学フレームワークを正式化し、代表モデルを分類します。
さらに、トレーニングと推論のための重要な手法を分析し、言語、ビジョン言語、生物学的領域を介した新興アプリケーションを要約します。
結論は、研究と展開の将来の方向性について議論することで終わります。
ペーパーコレクション：https：//github.com/liqiiiii/dllm-survey

要約(オリジナル)

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey

arxiv情報

著者	Runpeng Yu,Qi Li,Xinchao Wang
発行日	2025-06-16 17:59:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Discrete Diffusion in Large Language and Multimodal Models: A Survey

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー