FsaNet: Frequency Self-attention for Semantic Segmentation

要約

画像のスペクトル特性を考慮して、計算の複雑さを線形率まで大幅に削減した新しい自己注意メカニズムを提案します。
オブジェクト内の類似性を促進しながらエッジをより適切に保存するために、さまざまな周波数帯域にわたる個別のプロセスを提案します。
特に、低周波成分のみを処理する場合について検討します。
アブレーション研究により、ネットワークを再トレーニングしなくても、低周波セルフアテンションが全周波数に非常に近い、またはそれ以上のパフォーマンスを達成できることを示しました。
したがって、私たちは新しいプラグアンドプレイモジュールを設計し、FsaNet と呼ぶ CNN ネットワークのヘッドに埋め込みます。
周波数セルフアテンションは、1) 入力としていくつかの低周波数係数のみを必要とし、2) 線形構造を備えた空間領域セルフアテンションと数学的に同等にすることができ、3) トークンマッピング ($1\times1$ 畳み込み) ステージとトークン混合ステージを簡素化します。
同時に。
周波数セルフアテンションでは、通常のセルフアテンションよりも必要なメモリが $87.29\% \sim 90.04\%$ 少なく、FLOP が $96.13\% \sim 98.07\%$ 少なく、実行時間が $97.56\% \sim 98.18\%$ 少ないことがわかります。
注意。
他の ResNet101 ベースのセルフアテンションネットワークと比較して、\ourM は Cityscape テストデータセットで新しい \sArt 結果 ($83.0\%$ mIoU) を達成し、ADE20k と VOCaug で競合結果を達成しました。
\ourM は、COCO でのセグメンテーションなどの MASK R-CNN を強化することもできます。
さらに、提案モジュールを利用することで、スケールの異なる一連のモデル上で Segformer をブーストすることができ、再トレーニングなしでも Segformer-B5 を改善できます。
コードは \url{https://github.com/zfy-csu/FsaNet からアクセスできます。

要約(オリジナル)

Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) requires only a few low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, \ourM achieves a new \sArt result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug. \ourM can also enhance MASK R-CNN for instance segmentation on COCO. In addition, utilizing the proposed module, Segformer can be boosted on a series of models with different scales, and Segformer-B5 can be improved even without retraining. Code is accessible at \url{https://github.com/zfy-csu/FsaNet

arxiv情報

著者	Fengyu Zhang,Ashkan Panahi,Guangjun Gao
発行日	2023-07-26 08:50:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FsaNet: Frequency Self-attention for Semantic Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー