Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation

要約

エンコーダーデコーダーアテンショントランスフォーマー EDAFormer を紹介します。これは、エンベディングフリートランスフォーマー (EFT) エンコーダーと、エンベディングフリーアテンション (EFA) 構造を活用したオールアテンションデコーダーで構成されます。
提案された EFA は、クエリ、キー、および値の特定の役割ではなく、グローバルな非線形性を機能させることに焦点を当てた、新しいグローバルコンテキストモデリングメカニズムです。
デコーダについては、セマンティックセグメンテーションのパフォーマンスを向上させることができる、グローバル性を考慮した最適化された構造を探索します。
さらに、計算効率を高めるための新しい推論空間削減 (ISR) 手法を提案します。
以前の空間削減アテンション手法とは異なり、私たちの ISR 手法は推論フェーズでのキーと値の解決をさらに削減し、効率的なセマンティックセグメンテーションのための計算とパフォーマンスのトレードオフギャップを軽減できます。
当社の EDAFormer は、ADE20K、Cityscapes、COCO-Stuff を含む 3 つの公開ベンチマークで、既存のトランスフォーマーベースのセマンティックセグメンテーションモデルと比較して、効率的な計算による最先端のパフォーマンスを示しています。
さらに、当社の ISR 手法は、Cityscapes データセットの mIoU パフォーマンスの低下を最小限に抑えながら、計算コストを最大 61% 削減します。
コードは https://github.com/hyunwoo137/EDAFormer で入手できます。

要約(オリジナル)

We present an Encoder-Decoder Attention Transformer, EDAFormer, which consists of the Embedding-Free Transformer (EFT) encoder and the all-attention decoder leveraging our Embedding-Free Attention (EFA) structure. The proposed EFA is a novel global context modeling mechanism that focuses on functioning the global non-linearity, not the specific roles of the query, key and value. For the decoder, we explore the optimized structure for considering the globality, which can improve the semantic segmentation performance. In addition, we propose a novel Inference Spatial Reduction (ISR) method for the computational efficiency. Different from the previous spatial reduction attention methods, our ISR method further reduces the key-value resolution at the inference phase, which can mitigate the computation-performance trade-off gap for the efficient semantic segmentation. Our EDAFormer shows the state-of-the-art performance with the efficient computation compared to the existing transformer-based semantic segmentation models in three public benchmarks, including ADE20K, Cityscapes and COCO-Stuff. Furthermore, our ISR method reduces the computational cost by up to 61% with minimal mIoU performance degradation on Cityscapes dataset. The code is available at https://github.com/hyunwoo137/EDAFormer.

arxiv情報

著者	Hyunwoo Yu,Yubin Cho,Beoungwoo Kang,Seunghun Moon,Kyeongbo Kong,Suk-Ju Kang
発行日	2024-07-24 13:24:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー