Multi-modal Vision Pre-training for Medical Image Analysis

要約

自己学習学習は、実際のアプリケーションのトレーニングデータ要件を抑制することにより、医療画像分析を大幅に促進しました。
現在のパラダイムは、ユニモーダル画像データ内の自己監視に主に依存しており、それにより、クロスモーダル画像表現の効果的な学習に不可欠なモーダル間相関を無視します。
この制限は、同じ研究でさまざまな機能イメージングプロトコルを受けている患者のマルチパラメトリックMRIスキャンなど、自然にグループ化されたマルチモーダルデータにとって特に重要です。
このギャップを埋めるために、3つのプロキシタスクでトレーニング前の新しいマルチモーダルイメージを実施して、マルチモーダル脳MRIスキャン（3,755人の患者の16,022スキャンで240万枚以上の画像）、つまり、モダル画像の再構築、モダリティの密着症の蒸留距離蒸留、モダルアウェアアウェアイメージの再構築、240万件以上の画像（3,755人の患者の16,022スキャン）を使用して、相互モダリティ表現と相関の学習を促進します。
事前に訓練されたモデルの一般化可能性を実証するために、10のダウンストリームタスクを備えたさまざまなベンチマークで広範な実験を実施します。
私たちの方法の優れたパフォーマンスは、最先端のトレーニング前の方法と比較して報告され、6つのセグメンテーションベンチマークで0.28 \％ – 14.47 \％のDICEスコアの改善があり、4つの個別画像分類タスクで0.65 \％-18.07 \％の一貫した精度ブーストが報告されています。

要約(オリジナル)

Self-supervised learning has greatly facilitated medical image analysis by suppressing the training data requirement for real-world applications. Current paradigms predominantly rely on self-supervision within uni-modal image data, thereby neglecting the inter-modal correlations essential for effective learning of cross-modal image representations. This limitation is particularly significant for naturally grouped multi-modal data, e.g., multi-parametric MRI scans for a patient undergoing various functional imaging protocols in the same study. To bridge this gap, we conduct a novel multi-modal image pre-training with three proxy tasks to facilitate the learning of cross-modality representations and correlations using multi-modal brain MRI scans (over 2.4 million images in 16,022 scans of 3,755 patients), i.e., cross-modal image reconstruction, modality-aware contrastive learning, and modality template distillation. To demonstrate the generalizability of our pre-trained model, we conduct extensive experiments on various benchmarks with ten downstream tasks. The superior performance of our method is reported in comparison to state-of-the-art pre-training methods, with Dice Score improvement of 0.28\%-14.47\% across six segmentation benchmarks and a consistent accuracy boost of 0.65\%-18.07\% in four individual image classification tasks.

arxiv情報

著者	Shaohao Rui,Lingzhi Chen,Zhenyu Tang,Lilong Wang,Mianxin Liu,Shaoting Zhang,Xiaosong Wang
発行日	2025-03-14 14:32:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-modal Vision Pre-training for Medical Image Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー