Prediction hubs are context-informed frequent tokens in LLMs

要約

ハブネス、いくつかのポイントが不釣り合いな数の他のポイントの最近隣人の1つになる傾向は、一般に高次元データに標準距離測定を適用すると発生し、多くの場合、距離ベースの分析に悪影響を及ぼします。
自己回帰の大手言語モデル（LLMS）は高次元表現で動作するため、それらがハブネスの影響を受けているかどうかを尋ねます。
まず、LLMSによって実行される唯一の大規模な表現比較操作、すなわち、継続確率を決定するためのコンテキストと具体化されていないベクターの間で、通常、偏見の葉の外観を引き起こす距離現象の濃度によって特徴付けられないことを証明します。
次に、この比較が依然として高度なハブネスにつながることを経験的に示しますが、この場合のハブは妨害を構成しません。
それらはむしろ、次のトークン予測の可能性のある候補者のプールにしばしば現れるコンテキストに変化する頻繁なトークンの結果です。
ただし、LLM表現を比較するために他の距離を使用している場合、同じ理論的保証はありません。実際、迷惑なハブが現れます。
2つの主要なポイントがあります。
第一に、高次元空間で遍在している一方で、ハブネスは、LLMが次のトークン予測に使用されているときに緩和する必要があるネガティブな特性ではありません。
第二に、ユークリッドまたはコサイン距離を使用してLLMSからの表現を比較する場合、迷惑なハブのリスクが高く、実務家は関連する場合は緩和技術を使用する必要があります。

要約(オリジナル)

Hubness, the tendency for a few points to be among the nearest neighbours of a disproportionate number of other points, commonly arises when applying standard distance measures to high-dimensional data, often negatively impacting distance-based analysis. As autoregressive large language models (LLMs) operate on high-dimensional representations, we ask whether they are also affected by hubness. We first prove that the only large-scale representation comparison operation performed by LLMs, namely that between context and unembedding vectors to determine continuation probabilities, is not characterized by the concentration of distances phenomenon that typically causes the appearance of nuisance hubness. We then empirically show that this comparison still leads to a high degree of hubness, but the hubs in this case do not constitute a disturbance. They are rather the result of context-modulated frequent tokens often appearing in the pool of likely candidates for next token prediction. However, when other distances are used to compare LLM representations, we do not have the same theoretical guarantees, and, indeed, we see nuisance hubs appear. There are two main takeaways. First, hubness, while omnipresent in high-dimensional spaces, is not a negative property that needs to be mitigated when LLMs are being used for next token prediction. Second, when comparing representations from LLMs using Euclidean or cosine distance, there is a high risk of nuisance hubs and practitioners should use mitigation techniques if relevant.

arxiv情報

著者	Beatrix M. G. Nielsen,Iuri Macocco,Marco Baroni
発行日	2025-06-02 07:26:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Prediction hubs are context-informed frequent tokens in LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー