LieRE: Generalizing Rotary Position Encodings

要約

トランスアーキテクチャは、トークンの依存関係をキャプチャするために位置エンコーディングに依存しています。
回転位置エンコーディング（ロープ）は、キークエリの回転による相対位置情報の効率的なエンコードにより、言語モデルで人気のある選択肢として浮上しています。
ただし、ロープは言語処理以外の大きな制限に直面しています。これは、1次元シーケンスデータに制約されており、学習可能なフェーズであっても、限られた表現能力を提供します。
これらの課題は、嘘相対エンコーディング（Liere）で対処します。これは、ロープのブロック-2D回転マトリックスを、さまざまなスパースの学習し、密な高次元回転マトリックスに置き換えます。
2Dおよび3D分類タスクにわたる3つの画像データセットでの広範な評価を通じて、LiEREは2Dタスクの最先端のベースラインと3Dタスクで1.5 \％よりも2 \％相対的な改善を達成し、より高い解像度に優れた一般化を実証します。
私たちの実装は計算効率が高く、結果はCIFAR100で30分で4 A100 GPUで再現可能であり、さらなる研究を容易にするためにコードをリリースします。

要約(オリジナル)

Transformer architectures rely on position encodings to capture token dependencies. Rotary Position Encoding (RoPE) has emerged as a popular choice in language models due to its efficient encoding of relative position information through key-query rotations. However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which replaces RoPE’s block-2D rotation matrix with a learned, dense, high-dimensional rotation matrix of variable sparsity. Through extensive evaluation on three image datasets across 2D and 3D classification tasks, LieRE achieves 2\% relative improvement over state-of-the-art baselines on 2D tasks and 1.5\% on 3D tasks, while demonstrating superior generalization to higher resolutions. Our implementation is computationally efficient, with results reproducible on 4 A100 GPUs in 30 minutes on CIFAR100, and we release our code to facilitate further research.

arxiv情報

著者	Sophie Ostmeier,Brian Axelrod,Michael E. Moseley,Akshay Chaudhari,Curtis Langlotz
発行日	2025-02-18 16:52:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LieRE: Generalizing Rotary Position Encodings

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー