Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

要約

この研究では、パラメトリック知識を広く統合するモデルの傾向が事前トレーニングを通じてどのように進化するか、またこの動作が特に知識の獲得と忘却の点で全体的なパフォーマンスにどのような影響を与えるかを調査します。
モデルが関与するメモリソースの範囲を定量化する知識エントロピーの概念を導入します。
知識エントロピーが高い場合は、モデルが広範囲のメモリソースを利用していることを示し、知識エントロピーが低い場合は、特定のソースに確実に依存していることを示します。
私たちの分析では、事前トレーニングが進むにつれて知識エントロピーが一貫して減少していることが明らかになりました。
また、この低下はモデルの知識の獲得および保持能力の低下と密接に関連していることもわかり、知識エントロピーの減少（アクティブな記憶源の数の減少）がモデルの知識の獲得および保持能力を損なうと結論付けることができました。
非アクティブな記憶源の活動を増加させると、モデルの知識の獲得と保持の能力が強化されることを実証することで、これがさらに裏付けられることが分かりました。

要約(オリジナル)

In this work, we investigate how a model’s tendency to broadly integrate its parametric knowledge evolves throughout pretraining, and how this behavior affects overall performance, particularly in terms of knowledge acquisition and forgetting. We introduce the concept of knowledge entropy, which quantifies the range of memory sources the model engages with; high knowledge entropy indicates that the model utilizes a wide range of memory sources, while low knowledge entropy suggests reliance on specific sources with greater certainty. Our analysis reveals a consistent decline in knowledge entropy as pretraining advances. We also find that the decline is closely associated with a reduction in the model’s ability to acquire and retain knowledge, leading us to conclude that diminishing knowledge entropy (smaller number of active memory sources) impairs the model’s knowledge acquisition and retention capabilities. We find further support for this by demonstrating that increasing the activity of inactive memory sources enhances the model’s capacity for knowledge acquisition and retention.

arxiv情報

著者	Jiyeon Kim,Hyunji Lee,Hyowon Cho,Joel Jang,Hyeonbin Hwang,Seungpil Won,Youbin Ahn,Dohaeng Lee,Minjoon Seo
発行日	2024-12-02 08:43:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー