Round and Round We Go! What makes Rotary Positional Encodings useful?

要約

位置エンコーディング（PES）は、トランスベースの大型言語モデル（LLMS）の重要なコンポーネントであり、重要なシーケンスポジション情報で注意メカニズムを提供します。
LLMSで今日使用されている最も人気のあるタイプのエンコーディングの1つは、相対距離に基づいてクエリとキーを回転させる回転位置エンコーディング（ロープ）です。
一般的な信念は、相対距離が増加するにつれてトークンの依存関係を崩壊させるのに役立つため、ロープは有用であるということです。
この作業では、これが核となる理由である可能性は低いと主張します。
訓練されたジェマ7Bモデルの内部を研究して、ロープが機械レベルでどのように使用されているかを理解します。
Gemmaは、最高周波数を活用することにより、ロープを使用して堅牢な「位置的な」注意パターンを構築することを学ぶことがわかります。
また、一般的に、ジェマはロープの最低周波数を使用することを非常に好みます。これはセマンティック情報を運ぶために使用されると思われます。
数学的にロープの興味深い行動を証明し、調査結果を検証するための実験を実施し、強調された問題を修正し、パフォーマンスを改善するロープの変更を提案します。
この作業は、LLMSのPESをよりよく理解するための興味深いステップであると考えています。

要約(オリジナル)

Positional Encodings (PEs) are a critical component of Transformer-based Large Language Models (LLMs), providing the attention mechanism with important sequence-position information. One of the most popular types of encoding used today in LLMs are Rotary Positional Encodings (RoPE), that rotate the queries and keys based on their relative distance. A common belief is that RoPE is useful because it helps to decay token dependency as relative distance increases. In this work, we argue that this is unlikely to be the core reason. We study the internals of a trained Gemma 7B model to understand how RoPE is being used at a mechanical level. We find that Gemma learns to use RoPE to construct robust ‘positional’ attention patterns by exploiting the highest frequencies. We also find that, in general, Gemma greatly prefers to use the lowest frequencies of RoPE, which we suspect are used to carry semantic information. We mathematically prove interesting behaviours of RoPE and conduct experiments to verify our findings, proposing a modification of RoPE that fixes some highlighted issues and improves performance. We believe that this work represents an interesting step in better understanding PEs in LLMs, which we believe holds crucial value for scaling LLMs to large sizes and context lengths.

arxiv情報

著者	Federico Barbero,Alex Vitvitskyi,Christos Perivolaropoulos,Razvan Pascanu,Petar Veličković
発行日	2025-05-13 14:11:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Round and Round We Go! What makes Rotary Positional Encodings useful?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー