Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality

要約

テキスト生成タスクの微調整段階では、標準的なクロスエントロピー損失はすべてのトークンを等しく扱います。
これにより、モデルは高周波の低下の低下トークンを強調し、生成されたコンテンツの特異性と情報性に不可欠な低周波トークンを無視することができます。
このホワイトペーパーでは、テキスト生成のための微調整プロセスを最適化するために特別に設計された新しい損失関数であるパワーロー減衰損失（PDL）を紹介します。
PDLのコアの動機は、情報理論と言語学の観察から生じます。トークンの情報性は、しばしばその発生頻度に反比例します。
PDLは、パワーローの減衰に続いて、トレーニングコーパスの頻度に基づいて、標準のクロスエントロピー損失における各トークンの寄与を再重視します。
具体的には、高周波トークンのウェイトは減少しますが、低周波の情報密度の高いトークンにはより高い重みが割り当てられます。
このメカニズムは、微調整中にモデルを導き、特定のユニークな情報を伝えるトークンの学習と生成に焦点を当て、生成されたテキストの品質、多様性、情報を高めます。
PDLの動機付けと構築について理論的に詳しく説明し、抽象的な要約、対話システム、スタイルの転送など、さまざまなテキスト生成の微調整タスクにわたる潜在的なアプリケーションと利点について議論します。

要約(オリジナル)

During the finetuning stage of text generation tasks, standard cross-entropy loss treats all tokens equally. This can lead models to overemphasize high-frequency, low-information tokens, neglecting lower-frequency tokens crucial for specificity and informativeness in generated content. This paper introduces a novel loss function, Power-Law Decay Loss (PDL), specifically designed to optimize the finetuning process for text generation. The core motivation for PDL stems from observations in information theory and linguistics: the informativeness of a token is often inversely proportional to its frequency of occurrence. PDL re-weights the contribution of each token in the standard cross-entropy loss based on its frequency in the training corpus, following a power-law decay. Specifically, the weights for high-frequency tokens are reduced, while low-frequency, information-dense tokens are assigned higher weights. This mechanism guides the model during finetuning to focus more on learning and generating tokens that convey specific and unique information, thereby enhancing the quality, diversity, and informativeness of the generated text. We theoretically elaborate on the motivation and construction of PDL and discuss its potential applications and advantages across various text generation finetuning tasks, such as abstractive summarization, dialogue systems, and style transfer.

arxiv情報

著者	Jintian Shao,Hongyi Huang,Jiayi Wu,Beiwen Zhang,ZhiYu Wu,You Shan,MingKai Zheng
発行日	2025-05-22 16:59:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー