Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2

要約

このテクニカルレポートでは、継続的な学習の観点からCulturaxのリトアニア語コンポーネントで10 \％を持つGemma2 Parameter Large Langualsed Model（LLM）のGemma2の自己回帰前トレーニングに関する実験について説明しています。
モデルのパラメーターの完全なセットに弾性重量統合（EWC）を適用し、Arc、Belebele、GSM8K、Hellaswag、MMLU、Truthfulqa、およびWinograndeセット（英語とリトアニアの両方のバージョン）、およびPerplexity Benchmarksで構成される言語理解ベンチマークを調査します。
EWCの正則化により、壊滅的な忘却効果を緩和するだけでなく、LLMSを使用した新しいタスクを学ぶのに有益である可能性があることを経験的に実証します。

要約(オリジナル)

This technical report describes an experiment on autoregressive pre-training of Gemma2 2 billion parameter large language model (LLM) with 10\% on the Lithuanian language component of CulturaX from the point of view of continual learning. We apply elastic weight consolidation (EWC) to the full set of the model’s parameters and investigate language understanding benchmarks, consisting of Arc, Belebele, Gsm8K, Hellaswag, MMLU, TruthfulQA, and Winogrande sets (both in English and Lithuanian versions), and perplexity benchmarks. We empirically demonstrate that EWC regularisation allows us not only to mitigate catastrophic forgetting effects but also that it is potentially beneficial for learning of the new task with LLMs.

arxiv情報

著者	Vytenis Šliogeris,Povilas Daniušis,Artūras Nakvosas
発行日	2025-05-09 10:43:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー