SegViT: Semantic Segmentation with Plain Vision Transformers

要約

セマンティックセグメンテーションのためのプレーンビジョントランスフォーマー (ViT) の機能を調査し、SegVit を提案します。
以前の ViT ベースのセグメンテーションネットワークは通常、ViT の出力からピクセルレベルの表現を学習します。
別の方法として、セマンティックセグメンテーション用のマスクを生成するために、基本的なコンポーネントであるアテンションメカニズムを利用します。
具体的には、Attention-to-Mask (ATM) モジュールを提案します。このモジュールでは、一連の学習可能なクラストークンと空間特徴マップ間の類似性マップがセグメンテーションマスクに転送されます。
実験では、ATM モジュールを使用する提案された SegVit が、ADE20K データセットでプレーンな ViT バックボーンを使用する対応するものよりも優れており、COCO-Stuff-10K および PASCAL-Context データセットで新しい最先端のパフォーマンスを達成することが示されています。
さらに、ViT バックボーンの計算コストを削減するために、クエリベースのダウンサンプリング (QD) とクエリベースのアップサンプリング (QU) を提案して、シュランク構造を構築します。
提案された Shrunk 構造により、モデルは競争力のあるパフォーマンスを維持しながら最大 $40\%$ の計算を節約できます。

要約(オリジナル)

We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit. Previous ViT-based segmentation networks usually learn a pixel-level representation from the output of the ViT. Differently, we make use of the fundamental component — attention mechanism, to generate masks for semantic segmentation. Specifically, we propose the Attention-to-Mask (ATM) module, in which the similarity maps between a set of learnable class tokens and the spatial feature maps are transferred to the segmentation masks. Experiments show that our proposed SegVit using the ATM module outperforms its counterparts using the plain ViT backbone on the ADE20K dataset and achieves new state-of-the-art performance on COCO-Stuff-10K and PASCAL-Context datasets. Furthermore, to reduce the computational cost of the ViT backbone, we propose query-based down-sampling (QD) and query-based up-sampling (QU) to build a Shrunk structure. With the proposed Shrunk structure, the model can save up to $40\%$ computations while maintaining competitive performance.

arxiv情報

著者	Bowen Zhang,Zhi Tian,Quan Tang,Xiangxiang Chu,Xiaolin Wei,Chunhua Shen,Yifan Liu
発行日	2022-12-12 15:35:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SegViT: Semantic Segmentation with Plain Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー