Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

要約

このホワイトペーパーでは、医療用ビデオセグメンテーションのディープラーニングフレームワークについて説明します。
畳み込みニューラルネットワーク (CNN) とトランスフォーマーベースの手法は、その驚くべきセマンティックフィーチャエンコーディングとグローバルな情報理解能力により、医療画像セグメンテーションタスクにおいて大きなマイルストーンを達成しました。
ただし、ほとんどの既存のアプローチは、医療ビデオデータの顕著な側面である時間次元を無視しています。
提案されたフレームワークは、時間次元全体で隣接するフレームから特徴を明示的に抽出し、それらを時間特徴ブレンダーに組み込みます。これにより、高レベルの時空間特徴がトークン化され、Swin Transformer を介してエンコードされた強力なグローバル特徴が形成されます。
最終的なセグメンテーション結果は、UNet のようなエンコーダー/デコーダーアーキテクチャを介して生成されます。
私たちのモデルは、他のアプローチよりも大幅に優れており、VFSS2022 データセットのセグメンテーションベンチマークを改善し、テストした 2 つのデータセットで 0.8986 と 0.8186 のサイコロ係数を達成しました。
私たちの研究はまた、時間的特徴ブレンディングスキームの有効性と、学習された機能のクロスデータセット転送可能性を示しています。
コードとモデルは、https://github.com/SimonZeng7108/Video-SwinUNet で完全に入手できます。

要約(オリジナル)

This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data – the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the VFSS2022 dataset, achieving a dice coefficient of 0.8986 and 0.8186 for the two datasets tested. Our studies also show the efficacy of the temporal feature blending scheme and cross-dataset transferability of learned capabilities. Code and models are fully available at https://github.com/SimonZeng7108/Video-SwinUNet.

arxiv情報

著者	Chengxi Zeng,Xinyu Yang,David Smithard,Majid Mirmehdi,Alberto M Gambaruto,Tilo Burghardt
発行日	2023-02-22 12:09:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー