MEW-UNet: Multi-axis representation learning in frequency domain for medical image segmentation

要約

最近、Visual Transformer (ViT) は、空間ドメインの自己注意メカニズムをグローバル知識のモデル化に適用するため、コンピュータービジョンのさまざまな分野で広く使用されています。
特に医用画像セグメンテーション (MIS) では、ViT と CNN を組み合わせた研究が多く、純粋な ViT ベースのモデルを直接利用する研究もあります。
ただし、最近の研究では、周波数ドメイン情報の重要性を無視しながら、空間ドメインの側面でモデルが改善されました。
そこで、ViT の自己注意を Multi-axis External Weights ブロックに置き換えることで、U 字型アーキテクチャに基づく MIS 用の Multi-axis External Weights UNet (MEW-UNet) を提案します。
具体的には、ブロックは入力フィーチャの 3 つの軸に対してフーリエ変換を実行し、Weights Generator によって生成される周波数領域で外部重みを割り当てます。
次に、逆フーリエ変換を実行して、特徴を空間ドメインに戻します。
4 つのデータセットでモデルを評価し、最先端のパフォーマンスを実現します。
特に、Synapse データセットでは、HD95 で MT-UNet よりも 10.15mm 優れています。
コードは https://github.com/JCruan519/MEW-UNet で入手できます。

要約(オリジナル)

Recently, Visual Transformer (ViT) has been widely used in various fields of computer vision due to applying self-attention mechanism in the spatial domain to modeling global knowledge. Especially in medical image segmentation (MIS), many works are devoted to combining ViT and CNN, and even some works directly utilize pure ViT-based models. However, recent works improved models in the aspect of spatial domain while ignoring the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) for MIS based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input feature and assigns the external weight in the frequency domain, which is generated by our Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets and achieve state-of-the-art performances. In particular, on the Synapse dataset, our method outperforms MT-UNet by 10.15mm in terms of HD95. Code is available at https://github.com/JCruan519/MEW-UNet.

arxiv情報

著者	Jiacheng Ruan,Mingye Xie,Suncheng Xiang,Ting Liu,Yuzhuo Fu
発行日	2022-10-25 13:22:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MEW-UNet: Multi-axis representation learning in frequency domain for medical image segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー