Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

要約

画像レベルのラベルを用いた弱教師付きセマンティックセグメンテーション（WSSS）は、重要かつ挑戦的な課題である。学習効率が高いため、WSSSのためのエンドツーエンドのソリューションは、コミュニティからますます注目されている。しかし、現在の手法は主に畳み込みニューラルネットワークに基づいており、グローバルな情報を適切に探索することができないため、通常、不完全なオブジェクト領域が生成される。本論文では、前述の問題に対処するため、グローバルな情報を自然に統合するトランスフォーマーを導入し、エンドツーエンドWSSSのためのより統合的な初期擬似ラベルを生成することを目的とする。Transformerの自己注意と意味的親和性の間の固有の一貫性に動機づけられ、我々はTransformerの多頭自己注意（MHSA）から意味的親和性を学習するAffinity from Attention (AFA)モジュールを提案する。学習された親和性は、セグメンテーションのための初期擬似ラベルを改良するために利用される。さらに、AFAを監督するための信頼性の高い親和性ラベルを効率的に導き、擬似ラベルの局所的な一貫性を保証するために、擬似ラベルを洗練するために低レベル画像の外観情報を取り込む画素適応的洗練モジュールを考案する。広範な実験を行い、我々の手法はPASCAL VOC 2012とMS COCO 2014データセットにおいてそれぞれ66.0%と38.9%のmIoUを達成し、最近のエンドツーエンド手法といくつかの多段競合手法を大幅に上回る性能を示した。コードは https://github.com/rulixiang/afa で公開しています。

要約(オリジナル)

Weakly-supervised semantic segmentation (WSSS) with image-level labels is an important and challenging task. Due to the high training efficiency, end-to-end solutions for WSSS have received increasing attention from the community. However, current methods are mainly based on convolutional neural networks and fail to explore the global information properly, thus usually resulting in incomplete object regions. In this paper, to address the aforementioned problem, we introduce Transformers, which naturally integrate global information, to generate more integral initial pseudo labels for end-to-end WSSS. Motivated by the inherent consistency between the self-attention in Transformers and the semantic affinity, we propose an Affinity from Attention (AFA) module to learn semantic affinity from the multi-head self-attention (MHSA) in Transformers. The learned affinity is then leveraged to refine the initial pseudo labels for segmentation. In addition, to efficiently derive reliable affinity labels for supervising AFA and ensure the local consistency of pseudo labels, we devise a Pixel-Adaptive Refinement module that incorporates low-level image appearance information to refine the pseudo labels. We perform extensive experiments and our method achieves 66.0% and 38.9% mIoU on the PASCAL VOC 2012 and MS COCO 2014 datasets, respectively, significantly outperforming recent end-to-end methods and several multi-stage competitors. Code is available at https://github.com/rulixiang/afa.

arxiv情報

著者	Lixiang Ru,Yibing Zhan,Baosheng Yu,Bo Du
発行日	2022-09-08 05:38:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー