Improving Transformers with Dynamically Composable Multi-Head Attention

要約

マルチヘッドアテンション (MHA) は、Transformer の重要なコンポーネントです。
MHA では、アテンションヘッドが独立して動作するため、アテンションスコア行列の低ランクのボトルネックやヘッドの冗長性などの問題が発生します。
我々は、MHA の欠点に対処し、アテンションヘッドを動的に構成することでモデルの表現力を向上させる、パラメータと計算効率の高いアテンションアーキテクチャである Dynamically Composable Multi-Head Attendee (DCMHA) を提案します。
DCMHA の中核となるのは、入力に応じた方法で注意スコアと重み行列を変換する $\it{Compose}$ 関数です。
DCMHA は、対応する DCFormer を取得するために、任意の変圧器アーキテクチャで MHA のドロップイン代替品として使用できます。
DCFormer は、言語モデリングにおけるさまざまなアーキテクチャおよびモデルスケールで Transformer を大幅に上回り、約 1.7 倍から 2.0 倍のコンピューティングを備えたモデルのパフォーマンスに匹敵します。
たとえば、DCPythia-6.9B は、事前トレーニングの複雑さとダウンストリームタスクの評価の両方において、オープンソースの Pythia-12B よりも優れています。
コードとモデルは https://github.com/Caiyun-AI/DCFormer で入手できます。

要約(オリジナル)

Multi-Head Attention (MHA) is a key component of Transformer. In MHA, attention heads work independently, causing problems such as low-rank bottleneck of attention score matrices and head redundancy. We propose Dynamically Composable Multi-Head Attention (DCMHA), a parameter and computation efficient attention architecture that tackles the shortcomings of MHA and increases the expressive power of the model by dynamically composing attention heads. At the core of DCMHA is a $\it{Compose}$ function that transforms the attention score and weight matrices in an input-dependent way. DCMHA can be used as a drop-in replacement of MHA in any transformer architecture to obtain the corresponding DCFormer. DCFormer significantly outperforms Transformer on different architectures and model scales in language modeling, matching the performance of models with ~1.7x-2.0x compute. For example, DCPythia-6.9B outperforms open source Pythia-12B on both pretraining perplexity and downstream task evaluation. The code and models are available at https://github.com/Caiyun-AI/DCFormer.

arxiv情報

著者	Da Xiao,Qingye Meng,Shengping Li,Xingyuan Yuan
発行日	2024-05-14 12:41:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Transformers with Dynamically Composable Multi-Head Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー