Token Weighting for Long-Range Language Modeling

要約

大規模な言語モデル（LLM）の多くのアプリケーションには、長いコンテキストの理解が必要ですが、モデルはそのようなタスクに苦労し続けています。
各トークンには等しい重量が割り当てられているため、従来のネクストトークン予測トレーニングがこれに寄与する可能性があると仮定します。
しかし、直感的には、次のトークンを正確に予測するために必要なコンテキストの量は、異なるデータによって大きく異なります。
これを反映するために、私たちは、損失の各トレーニングトークンに異なる重みを割り当てるさまざまな新しいトークン加重スキームを提案し、それによって既存の作業を一般化します。
このために、トークンモデルとショートコンテキストモデルの自信を比較してトークンを採点する2段階のフレームワークを使用して、トークン加重メソッドを分類します。
複数の長いコンテキスト理解タスクのすべての方法を評価し、不均一な損失の重量がLLMSの長いコンテキスト能力を改善するのに役立つことを示します。
トレーニングされている長いコンテキストモデルよりもはるかに小さいモデルを含む、さまざまなショートコンテキストモデルをトークンスコアリングに効果的に使用できます。
全体として、この作業は、トレードオフの長いコンテキスト言語モデリングの顔をよりよく理解することに貢献し、経験的証拠に基づいた損失加重を介してモデルステアリングのガイドラインを提供します。
コードはgithubで見つけることができます。

要約(オリジナル)

Many applications of large language models (LLMs) require long-context understanding, but models continue to struggle with such tasks. We hypothesize that conventional next-token prediction training could contribute to this, because each token is assigned equal weight. Yet, intuitively, the amount of context needed to predict the next token accurately varies greatly across different data. To reflect this, we propose various novel token-weighting schemes that assign different weights to each training token in the loss, thereby generalizing existing works. For this, we categorize token-weighting methods using a two-step framework which compares the confidences of a long-context and short-context model to score tokens. We evaluate all methods on multiple long-context understanding tasks and show that non-uniform loss weights are helpful to improve the long-context abilities of LLMs. Different short-context models can be used effectively for token scoring, including models that are much smaller than the long-context model that is trained. All in all, this work contributes to a better understanding of the trade-offs long-context language modeling faces and provides guidelines for model steering via loss-weighting based on empirical evidence. The code can be found on Github.

arxiv情報

著者	Falko Helm,Nico Daheim,Iryna Gurevych
発行日	2025-03-12 09:46:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Token Weighting for Long-Range Language Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー