FsaNet: Frequency Self-attention for Semantic Segmentation

要約

画像のスペクトル特性を考慮して、計算の複雑さを線形レートまで大幅に削減した新しい自己注意メカニズムを提案します。
オブジェクト内の類似性を促進しながらエッジをより適切に維持するために、異なる周波数帯域で個別化されたプロセスを提案します。
特に、プロセスが単に低周波成分上にある場合を研究します。
アブレーション研究により、ネットワークを再トレーニングしなくても、低周波自己注意が全周波に非常に近いか、またはそれ以上のパフォーマンスを達成できることを示しています。
したがって、私たちは新しいプラグアンドプレイモジュールを設計し、FsaNet と呼ぶ CNN ネットワークの先頭に組み込みます。
周波数自己注意は、1) 低周波数係数を入力として受け取ります。2) 線形構造を持つ空間領域自己注意と数学的に同等であり、3) トークンマッピング ($1\times1$ 畳み込み) 段階とトークン混合段階を同時に簡素化します。
頻度自己注意は通常の自己注意よりも $87.29\% \sim 90.04\%$ 少ないメモリ、$96.13\% \sim 98.07\%$ 少ない FLOP、実行時間で $97.56\% \sim 98.18\%$ 必要であることを示します
-注意。
他の ResNet101 ベースの自己注意ネットワークと比較して、FsaNet は Cityscape テストデータセットで新しい最先端の結果 ($83.0\%$ mIoU) を達成し、ADE20k と VOCaug で競争力のある結果を達成しています。

要約(オリジナル)

Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) takes low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that the frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, FsaNet achieves a new state-of-the-art result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug.

arxiv情報

著者	Fengyu Zhang,Ashkan Panahi,Guangjun Gao
発行日	2022-11-28 17:49:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FsaNet: Frequency Self-attention for Semantic Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー