Pattern Attention Transformer with Doughnut Kernel

要約

この論文では、新しいドーナツカーネルで構成される新しいアーキテクチャ、パターンアテンショントランスフォーマー (PAT) を紹介します。
NLP 分野のトークンと比較すると、コンピュータービジョンの Transformer は、画像内のピクセルの高解像度を処理するという問題があります。
ドーナツカーネルは、ViT とそのフォローアップからパッチ/ウィンドウのアイデアを継承し、パッチのデザインを強化します。
ラインカットの境界を、自己注意の理解に基づくセンサーと更新の 2 種類の領域 (QKVA グリッドと呼ばれる) に置き換えます。
ドーナツカーネルは、カーネルの形状に関する新しいトピックももたらします。
画像分類のパフォーマンスを検証するために、PAT は正八角形のドーナツカーネルの Transformer ブロックを使用して設計されています。
ImageNet 1K でのパフォーマンスは、Swin Transformer (+0.7 acc1) を上回ります。

要約(オリジナル)

We present in this paper a new architecture, the Pattern Attention Transformer (PAT), that is composed of the new doughnut kernel. Compared with tokens in the NLP field, Transformer in computer vision has the problem of handling the high resolution of pixels in images. Inheriting the patch/window idea from ViT and its follow-ups, the doughnut kernel enhances the design of patches. It replaces the line-cut boundaries with two types of areas: sensor and updating, which is based on the comprehension of self-attention (named QKVA grid). The doughnut kernel also brings a new topic about the shape of kernels. To verify its performance on image classification, PAT is designed with Transformer blocks of regular octagon shape doughnut kernels. Its performance on ImageNet 1K surpasses the Swin Transformer (+0.7 acc1).

arxiv情報

著者	WenYuan Sheng
発行日	2022-11-30 13:11:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Pattern Attention Transformer with Doughnut Kernel

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー