HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image

要約

全スライド画像 (WSI) に基づく生存予測は、患者レベルのマルチインスタンス学習 (MIL) にとって困難なタスクです。
患者の膨大な量のデータ (1 つまたは複数のギガピクセル WSI) と WSI の不規則な形状の特性により、患者レベルのバッグ内の空間的、文脈的、および階層的な相互作用を完全に調査することは困難です。
多くの研究では、ランダムサンプリングの前処理戦略と WSI レベルの集計モデルが採用されており、患者レベルのバッグ内の重要な予後情報が必然的に失われます。
この研究では、HVTSurv という名前の階層型ビジョン Transformer フレームワークを提案します。これは、ローカルレベルの相対空間情報をエンコードし、WSI レベルのコンテキスト認識通信を強化し、患者レベルの階層的な相互作用を確立できます。
まず、特徴の再配置やランダムウィンドウマスキングなどの特徴の前処理戦略を設計します。
次に、マンハッタン距離を採用したローカルレベルのインタラクション層、空間シャッフルを採用した WSI レベルのインタラクション層、アテンションプーリングを使用した患者レベルのインタラクション層を含む、患者レベルの表現を段階的に取得するための 3 つの層を考案しました。
さらに、階層ネットワークの設計により、モデルの計算効率が向上します。
最後に、The Cancer Genome Atlas (TCGA) の 6 種類のがんにわたる 3,104 人の患者と 3,752 人の WSI を対象に HVTSurv を検証しました。
平均 C-Index は、6 つの TCGA データセットにわたる以前のすべての弱教師手法よりも 2.50 ～ 11.30% 高くなります。
アブレーション研究と注意の視覚化により、提案された HVTSurv の優位性がさらに検証されます。
実装は https://github.com/szc19990412/HVTSurv で入手できます。

要約(オリジナル)

Survival prediction based on whole slide images (WSIs) is a challenging task for patient-level multiple instance learning (MIL). Due to the vast amount of data for a patient (one or multiple gigapixels WSIs) and the irregularly shaped property of WSI, it is difficult to fully explore spatial, contextual, and hierarchical interaction in the patient-level bag. Many studies adopt random sampling pre-processing strategy and WSI-level aggregation models, which inevitably lose critical prognostic information in the patient-level bag. In this work, we propose a hierarchical vision Transformer framework named HVTSurv, which can encode the local-level relative spatial information, strengthen WSI-level context-aware communication, and establish patient-level hierarchical interaction. Firstly, we design a feature pre-processing strategy, including feature rearrangement and random window masking. Then, we devise three layers to progressively obtain patient-level representation, including a local-level interaction layer adopting Manhattan distance, a WSI-level interaction layer employing spatial shuffle, and a patient-level interaction layer using attention pooling. Moreover, the design of hierarchical network helps the model become more computationally efficient. Finally, we validate HVTSurv with 3,104 patients and 3,752 WSIs across 6 cancer types from The Cancer Genome Atlas (TCGA). The average C-Index is 2.50-11.30% higher than all the prior weakly supervised methods over 6 TCGA datasets. Ablation study and attention visualization further verify the superiority of the proposed HVTSurv. Implementation is available at: https://github.com/szc19990412/HVTSurv.

arxiv情報

著者	Zhuchen Shao,Yang Chen,Hao Bian,Jian Zhang,Guojun Liu,Yongbing Zhang
発行日	2023-06-30 02:26:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー