Automatic Channel Pruning for Multi-Head Attention

要約

Transformers の優れたパフォーマンスにもかかわらず、その 2 次計算の複雑さにより、Transformers を視覚タスクに適用する際に課題が生じます。
自動枝刈りは、ヒューリスティックなアプローチを使用せずに計算の複雑さを軽減する効果的な方法の 1 つです。
ただし、これをマルチヘッドアテンションに直接適用することは、チャネルのずれがあるため簡単ではありません。
本稿では、マルチヘッドアテンションメカニズムを考慮した自動チャネルプルーニング手法を提案します。
まず、チャネルの類似性に基づく重みを枝刈りインジケーターに組み込んで、各ヘッドでより多くの情報を提供するチャネルを保持します。
次に、プルーニングインジケーターを調整して、すべてのヘッドにわたって同じ割合でチャネルが削除されるようにし、チャネルの位置ずれを防ぎます。
また、チャネル除去による情報損失を補償するための再重み付けモジュールと、元の構造と各チャネル間の注意の違いに基づく枝刈りインジケーターの効果的な初期化ステップも追加します。
提案手法は元のアテンションだけでなく、トークン数に対する線形複雑度としてより効率的な線形アテンションにも使用できます。
ImageNet-1K では、両方のアテンションメカニズムを含む FLattenTransformer にプルーニングメソッドを適用すると、以前の最先端の効率的なモデルやプルーニングされたメソッドと比較して、いくつかの MAC で優れた精度が示されます。
コードはすぐに利用可能になります。

要約(オリジナル)

Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic channel pruning method to take into account the multi-head attention mechanism. First, we incorporate channel similarity-based weights into the pruning indicator to preserve more informative channels in each head. Then, we adjust pruning indicator to enforce removal of channels in equal proportions across all heads, preventing the channel misalignment. We also add a reweight module to compensate for information loss resulting from channel removal, and an effective initialization step for pruning indicator based on difference of attention between original structure and each channel. Our proposed method can be used to not only original attention, but also linear attention, which is more efficient as linear complexity with respect to the number of tokens. On ImageNet-1K, applying our pruning method to the FLattenTransformer, which includes both attention mechanisms, shows outperformed accuracy for several MACs compared with previous state-of-the-art efficient models and pruned methods. Code will be available soon.

arxiv情報

著者	Eunho Lee,Youngbae Hwang
発行日	2024-05-31 14:47:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Automatic Channel Pruning for Multi-Head Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー