CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

要約

Transformer アーキテクチャは、幅広いタスクに強力なツールであることがわかっています。
これは、セルフアテンションメカニズムに基づいています。これは、二次計算の複雑さを伴う本質的に計算コストのかかる操作です。メモリ使用量と計算時間は、入力シーケンスの長さに応じて二次関数的に増加するため、トランスフォーマーの適用が制限されます。
この研究では、アテンションの計算を最適化し、効率的なトランスフォーマーを実現するために、サロゲートトークン (CAST) を使用した新しいクラスタリングセルフアテンションメカニズムを提案します。
CAST は、学習可能なサロゲートトークンを利用してクラスターアフィニティマトリックスを構築します。これは、入力シーケンスをクラスター化し、新しいクラスターサマリーを生成するために使用されます。
各クラスター内からのセルフアテンションは、他のクラスターのクラスター概要と結合され、入力シーケンス全体にわたる情報フローが可能になります。
CAST は、複雑さを $O(N^2)$ から $O(\alpha N)$ に減らすことで効率を向上させます。ここで、N はシーケンスの長さであり、{\alpha} はクラスターとクラスターごとのサンプルの数に応じて定数です。
我々は、CAST が長距離シーケンスモデリングタスクにおいてベースライン Transformer よりも優れているか、同等のパフォーマンスを発揮すると同時に、他の効率的な Transformer よりも時間内に高い結果とメモリ効率を達成できることを示します。

要約(オリジナル)

The Transformer architecture has shown to be a powerful tool for a wide range of tasks. It is based on the self-attention mechanism, which is an inherently computationally expensive operation with quadratic computational complexity: memory usage and compute time increase quadratically with the length of the input sequences, thus limiting the application of Transformers. In this work, we propose a novel Clustering self-Attention mechanism using Surrogate Tokens (CAST), to optimize the attention computation and achieve efficient transformers. CAST utilizes learnable surrogate tokens to construct a cluster affinity matrix, used to cluster the input sequence and generate novel cluster summaries. The self-attention from within each cluster is then combined with the cluster summaries of other clusters, enabling information flow across the entire input sequence. CAST improves efficiency by reducing the complexity from $O(N^2)$ to $O(\alpha N)$ where N is the sequence length, and {\alpha} is constant according to the number of clusters and samples per cluster. We show that CAST performs better than or comparable to the baseline Transformers on long-range sequence modeling tasks, while also achieving higher results on time and memory efficiency than other efficient transformers.

arxiv情報

著者	Adjorn van Engelenhoven,Nicola Strisciuglio,Estefanía Talavera
発行日	2024-02-06 18:47:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー