Agent Attention: On the Integration of Softmax and Linear Attention

要約

アテンションモジュールは、トランスフォーマーの重要なコンポーネントです。
グローバルアテンションメカニズムは高い表現力を提供しますが、計算コストが高すぎるため、さまざまなシナリオへの適用が制限されます。
この論文では、計算効率と表現力の間で好ましいバランスを取るための新しいアテンションパラダイムであるエージェントアテンションを提案します。
具体的には、4 倍の $(Q, A, K, V)$ で示されるエージェントアテンションは、従来のアテンションモジュールに追加のエージェントトークン $A$ のセットを導入します。
エージェントトークンは、まずクエリトークン $Q$ のエージェントとして機能して、$K$ と $V$ からの情報を集約し、次にその情報を $Q$ にブロードキャストして返します。
エージェントトークンの数がクエリトークンの数よりもはるかに少なくなるように設計できることを考慮すると、エージェントアテンションは、グローバルコンテキストモデリング機能を維持しながら、広く採用されているソフトマックスアテンションよりも大幅に効率的になります。
興味深いことに、提案されたエージェントの注意が線形注意の一般化された形式と同等であることを示します。
したがって、エージェントアテンションは、強力な Softmax アテンションと非常に効率的なリニアアテンションをシームレスに統合します。
広範な実験により、さまざまなビジョントランスフォーマーと、画像分類、オブジェクト検出、セマンティックセグメンテーション、画像生成などの多様なビジョンタスクにわたるエージェントの注意の有効性が実証されています。
特に、エージェントの注意は、その直線的な注意の性質により、高解像度のシナリオで顕著なパフォーマンスを示しています。
たとえば、安定拡散に適用すると、エージェントの注意により生成が加速され、追加のトレーニングなしで画像生成の品質が大幅に向上します。
コードは https://github.com/LeapLabTHU/Agent-Attendance で入手できます。

要約(オリジナル)

The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module. The agent tokens first act as the agent for the query tokens $Q$ to aggregate information from $K$ and $V$, and then broadcast the information back to $Q$. Given the number of agent tokens can be designed to be much smaller than the number of query tokens, the agent attention is significantly more efficient than the widely adopted Softmax attention, while preserving global context modelling capability. Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear attention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient linear attention. Extensive experiments demonstrate the effectiveness of agent attention with various vision Transformers and across diverse vision tasks, including image classification, object detection, semantic segmentation and image generation. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training. Code is available at https://github.com/LeapLabTHU/Agent-Attention.

arxiv情報

著者	Dongchen Han,Tianzhu Ye,Yizeng Han,Zhuofan Xia,Shiji Song,Gao Huang
発行日	2023-12-14 16:26:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Agent Attention: On the Integration of Softmax and Linear Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー