Multi-manifold Attention for Vision Transformers

要約

Vision Transformerは、画像分類やアクション認識などのいくつかのコンピュータービジョンタスクでの最先端のパフォーマンスにより、今日非常に人気があります。
畳み込みニューラルネットワーク、階層構造、コンパクトなフォームを採用することでVision Transformerのパフォーマンスは大幅に向上しましたが、追加のデータ表現を利用してTransformerネットワークのマルチヘッドアテンションから派生したアテンションマップを改良する方法に関する研究は限られています。
この作業は、トランスフォーマーベースのネットワークの標準的な注意メカニズムを置き換えることができる、マルチマニホールド注意と呼ばれる新しい注意メカニズムを提案します。
提案された注意は、ユークリッド、対称正定値、グラスマンの3つの異なる多様体で入力空間をモデル化し、統計的および幾何学的特性が異なり、ネットワークが外観、色、テクスチャを説明する豊富な情報セットを考慮に入れるように導きます。
画像、非常に記述的な注意マップの計算用。
このようにして、いくつかのよく知られた画像分類データセットの実験結果に示されているように、提案された注意を払ったVision Transformerは、識別機能にさらに注意を向けるように導かれ、分類結果の改善につながります。

要約(オリジナル)

Vision Transformer are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although the performance of Vision Transformers have been greatly improved by employing Convolutional Neural Networks, hierarchical structures and compact forms, there is limited research on ways to utilize additional data representations to refine the attention map derived from the multi-head attention of a Transformer network. This work proposes a novel attention mechanism, called multi-manifold attention, that can substitute any standard attention mechanism in a Transformer-based network. The proposed attention models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, with different statistical and geometrical properties, guiding the network to take into consideration a rich set of information that describe the appearance, color and texture of an image, for the computation of a highly descriptive attention map. In this way, a Vision Transformer with the proposed attention is guided to become more attentive towards discriminative features, leading to improved classification results, as shown by the experimental results on several well-known image classification datasets.

arxiv情報

著者	Dimitrios Konstantinidis,Ilias Papastratis,Kosmas Dimitropoulos,Petros Daras
発行日	2022-07-18 12:53:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-manifold Attention for Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー