MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation

要約

畳み込みニューラルネットワーク (CNN) は、その登場以来、医療画像分析において大きな進歩を遂げてきました。
ただし、畳み込み演算子のローカルな性質により、CNN でグローバルで長距離の相互作用を捕捉するには制限が生じる可能性があります。
最近、トランスフォーマーは、グローバルな特徴を効果的に処理できるため、コンピュータビジョンコミュニティだけでなく、医療画像のセグメンテーションでも人気を集めています。
自己注意メカニズムのスケーラビリティの問題と CNN のような誘導バイアスの欠如により、その採用が制限されている可能性があります。
したがって、畳み込みメカニズムとセルフアテンションメカニズムの両方の利点を活用するハイブリッドビジョントランスフォーマー (CNN トランスフォーマー) の重要性が高まっています。
この研究では、医用画像セグメンテーション用の新しいエンコーダーデコーダーベースの UNet タイプハイブリッドビジョントランスフォーマー (CNN-Transformer) である MaxViT-UNet を紹介します。
提案されたハイブリッドデコーダは、公称メモリと計算負荷で各デコード段階で畳み込みとセルフアテンションメカニズムの両方の力を利用するように設計されています。
各デコーダ段階内に多軸セルフアテンションを組み込むことにより、オブジェクト領域と背景領域との間の識別能力が大幅に強化され、それによってセグメンテーション効率の向上に役立ちます。
Hybrid Decoder では、新しいブロックも提案されています。
融合プロセスは、転置畳み込みによって取得された、アップサンプリングされた下位レベルのデコーダの特徴を、ハイブリッドエンコーダから得られたスキップ接続の特徴と統合することによって開始されます。
その後、融合されたフィーチャは、多軸アテンションメカニズムを利用して改良されます。
提案されたデコーダブロックは複数回繰り返され、核領域が段階的にセグメント化されます。
MoNuSeg18 および MoNuSAC20 データセットの実験結果は、提案された手法の有効性を示しています。

要約(オリジナル)

Since their emergence, Convolutional Neural Networks (CNNs) have made significant strides in medical image analysis. However, the local nature of the convolution operator may pose a limitation for capturing global and long-range interactions in CNNs. Recently, Transformers have gained popularity in the computer vision community and also in medical image segmentation due to their ability to process global features effectively. The scalability issues of the self-attention mechanism and lack of the CNN-like inductive bias may have limited their adoption. Therefore, hybrid Vision transformers (CNN-Transformer), exploiting the advantages of both Convolution and Self-attention Mechanisms, have gained importance. In this work, we present MaxViT-UNet, a new Encoder-Decoder based UNet type hybrid vision transformer (CNN-Transformer) for medical image segmentation. The proposed Hybrid Decoder is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage with a nominal memory and computational burden. The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions, thereby helping in improving the segmentation efficiency. In the Hybrid Decoder, a new block is also proposed. The fusion process commences by integrating the upsampled lower-level decoder features, obtained through transpose convolution, with the skip-connection features derived from the hybrid encoder. Subsequently, the fused features undergo refinement through the utilization of a multi-axis attention mechanism. The proposed decoder block is repeated multiple times to segment the nuclei regions progressively. Experimental results on MoNuSeg18 and MoNuSAC20 datasets demonstrate the effectiveness of the proposed technique.

arxiv情報

著者	Abdul Rehman Khan,Asifullah Khan
発行日	2024-03-29 12:50:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー