VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation

要約

自己監視された前のテクニックは、ダウンストリームタスクのパフォーマンスを改善するために広く使用されています。
ただし、現実世界の磁気共鳴（MR）研究は通常、さまざまな取得プロトコルによる異なるコントラストのセットで構成されており、これらの方法は通常、入力モダリティの固定セットまたはコントラストを必要とするため、大規模な前からの異なるダウンストリームタスクの現在の深い学習方法と異なる入力要件の異なる下流タスクの課題をもたらします。
この課題に対処するために、各研究でさまざまなコントラストのために自己監視前の事前削減とセグメンテーションの微調整のために設計された変圧器ベースのフレームワークである可変入力VIT（VIVIT）を提案します。
この能力により、当社のアプローチは、プレレインのデータの可用性を最大化することができ、入力要件のばらつきにもかかわらず、学習した知識をプレレインから下流のタスクに転送できます。
脳の梗塞と脳腫瘍のセグメンテーションに関する方法を検証します。ここでは、それぞれ平均DICEスコアが0.624と0.883の現在のCNNおよびVITベースのモデルよりも優れています。
これらの結果は、実際の不均一なMRデータを使用したタスクでの適応性とパフォーマンスを向上させるための設計の有効性を強調しています。

要約(オリジナル)

Self-supervised pretrain techniques have been widely used to improve the downstream tasks’ performance. However, real-world magnetic resonance (MR) studies usually consist of different sets of contrasts due to different acquisition protocols, which poses challenges for the current deep learning methods on large-scale pretrain and different downstream tasks with different input requirements, since these methods typically require a fixed set of input modalities or, contrasts. To address this challenge, we propose variable-input ViT (VIViT), a transformer-based framework designed for self-supervised pretraining and segmentation finetuning for variable contrasts in each study. With this ability, our approach can maximize the data availability in pretrain, and can transfer the learned knowledge from pretrain to downstream tasks despite variations in input requirements. We validate our method on brain infarct and brain tumor segmentation, where our method outperforms current CNN and ViT-based models with a mean Dice score of 0.624 and 0.883 respectively. These results highlight the efficacy of our design for better adaptability and performance on tasks with real-world heterogeneous MR data.

arxiv情報

著者	Badhan Kumar Das,Ajay Singh,Gengyan Zhao,Han Liu,Thomas J. Re,Dorin Comaniciu,Eli Gibson,Andreas Maier
発行日	2025-05-13 15:52:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー