Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

要約

既存の変換器ベースの画像バックボーンは、一般的に低階層から高階層へと一方向に特徴情報を伝播させます。これは理想的とは言えない。なぜなら、正確な物体境界を定義する局在化能力は、低次の高解像度特徴マップにおいて最も顕著であり、一方、ある物体に属する画像信号と他の物体に属する画像信号を区別する意味付けは、通常、より高いレベルの処理において現れるからである。我々は、異なるレベルの特徴間のボトムアップとトップダウンの更新を捉える注意に基づく手法、階層的インターレベルアテンション（HILA）を発表する。HILAは、上位レベルの特徴量と下位レベルの特徴量の間の局所的な接続をバックボーンエンコーダに追加することにより、階層的な視覚変換器アーキテクチャを拡張する。各反復において、上位レベルの特徴が、それらに属する下位レベルの特徴を更新するための割り当てを競い、オブジェクト-パーツ関係を反復的に解決することにより、階層を構築する。このように改善された下位素性は、上位素性の再更新に利用される。HILAは、基本モデルを変更することなく、大多数の階層型アーキテクチャに統合することができる。我々はHILAをSegFormerとSwin Transformerに追加し、より少ないパラメータとFLOPSで意味的セグメンテーションにおける精度の顕著な改善を示す。プロジェクトのウェブサイトとコード: https://www.cs.toronto.edu/~garyleung/hila/

要約(オリジナル)

Existing transformer-based image backbones typically propagate feature information in one direction from lower to higher-levels. This may not be ideal since the localization ability to delineate accurate object boundaries, is most prominent in the lower, high-resolution feature maps, while the semantics that can disambiguate image signals belonging to one object vs. another, typically emerges in a higher level of processing. We present Hierarchical Inter-Level Attention (HILA), an attention-based method that captures Bottom-Up and Top-Down Updates between features of different levels. HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder. In each iteration, we construct a hierarchy by having higher-level features compete for assignments to update lower-level features belonging to them, iteratively resolving object-part relationships. These improved lower-level features are then used to re-update the higher-level features. HILA can be integrated into the majority of hierarchical architectures without requiring any changes to the base model. We add HILA into SegFormer and the Swin Transformer and show notable improvements in accuracy in semantic segmentation with fewer parameters and FLOPS. Project website and code: https://www.cs.toronto.edu/~garyleung/hila/

arxiv情報

著者	Gary Leung,Jun Gao,Xiaohui Zeng,Sanja Fidler
発行日	2022-07-05 15:47:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー