CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

要約

大規模マルチモーダルモデル (LMM) の出現により、病理学に大きな進歩がもたらされました。
これまでの研究は主に、パッチレベルのモデルとスライド全体画像 (WSI) レベルのモデルを個別にトレーニングすることに焦点を当てており、パッチと WSI 間での学習された知識の統合が制限され、結果としてモデルが冗長になりました。
この研究では、パッチレベルと WSI レベルの両方の画像分析を統合するように設計された初の 150 億パラメータ LMM である CPath-Omni を紹介し、分類、視覚的な質問応答、キャプション、視覚的な参照など、両方のレベルでのさまざまなタスクを統合します。
促す。
広範な実験により、CPath-Omni が 42 データセット中 39 の 7 つの多様なタスクにわたって最先端 (SOTA) パフォーマンスを達成し、個々のタスク用にトレーニングされたタスク固有のモデルを上回るか、それに匹敵することが実証されました。
さらに、CPath-Omni 用の特殊な病理学 CLIP ベースのビジュアルプロセッサである CPath-CLIP を開発します。これは、初めてさまざまな視覚モデルを統合し、より強力な CLIP モデルを構築するためのテキストエンコーダとして大規模な言語モデルを組み込んでいます。
これにより、9 つのゼロショットデータセットと 4 つの少数ショットデータセットで SOTA パフォーマンスが達成されます。
私たちの調査結果は、CPath-Omni が多様な病理学タスクを統合できる能力を強調し、病理学における基礎モデルの分野を合理化し、進歩させる可能性を実証しています。

要約(オリジナル)

The emergence of large multimodal models (LMMs) has brought significant advancements to pathology. Previous research has primarily focused on separately training patch-level and whole-slide image (WSI)-level models, limiting the integration of learned knowledge across patches and WSIs, and resulting in redundant models. In this work, we introduce CPath-Omni, the first 15-billion-parameter LMM designed to unify both patch and WSI level image analysis, consolidating a variety of tasks at both levels, including classification, visual question answering, captioning, and visual referring prompting. Extensive experiments demonstrate that CPath-Omni achieves state-of-the-art (SOTA) performance across seven diverse tasks on 39 out of 42 datasets, outperforming or matching task-specific models trained for individual tasks. Additionally, we develop a specialized pathology CLIP-based visual processor for CPath-Omni, CPath-CLIP, which, for the first time, integrates different vision models and incorporates a large language model as a text encoder to build a more powerful CLIP model, which achieves SOTA performance on nine zero-shot and four few-shot datasets. Our findings highlight CPath-Omni’s ability to unify diverse pathology tasks, demonstrating its potential to streamline and advance the field of foundation model in pathology.

arxiv情報

著者	Yuxuan Sun,Yixuan Si,Chenglu Zhu,Xuan Gong,Kai Zhang,Pingyi Chen,Ye Zhang,Zhongyi Shui,Tao Lin,Lin Yang
発行日	2024-12-16 18:46:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー