Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

要約

大規模な言語モデル（LLMS）は、多数のタスクにわたって顕著な能力を実証していますが、基礎となるメカニズムと、スケーリング法、幻覚、関連行動など、いくつかの現象についての原則的な説明はとらえどころのないままです。
この作業では、Kolmogorovの複雑さとShannon情報理論に基づいた圧縮と予測の古典的な関係を再訪し、LLM行動に関するより深い洞察を提供します。
コルモゴロフ構造関数を活用し、LLM圧縮を2部構成のコーディングプロセスとして解釈することにより、LLMSが、広範な構文パターンから徐々に希少な知識要素まで、増加するモデルとデータスケール全体で情報を取得および保存する方法を詳細に見ています。
HeapとZIPFの法則に触発されたこの理論的な視点と自然な仮定に動機付けられ、Syntax-Knowledgeモデルと呼ばれる単純化された階層データ生成フレームワークを紹介します。
ベイジアンの設定では、このモデル内の予測と圧縮が自然にLLMの多様な学習とスケーリングの行動につながることを示します。
特に、当社の理論分析は、データとモデルのスケーリング法則、トレーニング中の知識獲得のダイナミクスとLLMSの微調整の事実の幻覚の両方について、直感的かつ原則的な説明を提供します。
実験結果は、理論的予測を検証します。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet principled explanations for their underlying mechanisms and several phenomena, such as scaling laws, hallucinations, and related behaviors, remain elusive. In this work, we revisit the classical relationship between compression and prediction, grounded in Kolmogorov complexity and Shannon information theory, to provide deeper insights into LLM behaviors. By leveraging the Kolmogorov Structure Function and interpreting LLM compression as a two-part coding process, we offer a detailed view of how LLMs acquire and store information across increasing model and data scales — from pervasive syntactic patterns to progressively rarer knowledge elements. Motivated by this theoretical perspective and natural assumptions inspired by Heap’s and Zipf’s laws, we introduce a simplified yet representative hierarchical data-generation framework called the Syntax-Knowledge model. Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors of LLMs. In particular, our theoretical analysis offers intuitive and principled explanations for both data and model scaling laws, the dynamics of knowledge acquisition during training and fine-tuning, factual knowledge hallucinations in LLMs. The experimental results validate our theoretical predictions.

arxiv情報

著者	Zhixuan Pan,Shaowen Wang,Jian Li
発行日	2025-04-21 15:18:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー