Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Clinical Pathology Analysis

要約

病理学的診断は、疾患の特徴を決定し、治療を導き、予後を評価するために不可欠であり、高解像度ホールスライド画像 (WSI) の詳細なマルチスケール分析に大きく依存しています。
ただし、従来の純粋なビジョンモデルは冗長な特徴抽出という課題に直面していますが、既存の大規模ビジョン言語モデル (LVLM) は入力解像度の制約によって制限され、効率と精度が妨げられています。
これらの問題を克服するために、我々は 2 つの革新的な戦略を提案します。1 つはスケール全体で病変関連の詳細に向けて特徴抽出を指示する混合タスクガイド付き特徴強調、もう 1 つはプロンプトガイド付きの詳細特徴補完であり、病変からの粗粒特徴と細粒特徴を統合します。
推論速度を損なうことなく、特定のプロンプトに基づく WSI。
がんの検出、等級分け、血管および神経浸潤の識別などを含む、さまざまな病理学タスクからの 490,000 サンプルの包括的なデータセットを活用して、病理学に特化した LVLM、OmniPath をトレーニングしました。
広範な実験により、このモデルは診断の精度と効率において既存の方法を大幅に上回っており、幅広い病理アプリケーションにおける補助診断のためのインタラクティブで臨床に合わせたアプローチを提供することが実証されています。

要約(オリジナル)

Pathological diagnosis is vital for determining disease characteristics, guiding treatment, and assessing prognosis, relying heavily on detailed, multi-scale analysis of high-resolution whole slide images (WSI). However, traditional pure vision models face challenges of redundant feature extraction, whereas existing large vision-language models (LVLMs) are limited by input resolution constraints, hindering their efficiency and accuracy. To overcome these issues, we propose two innovative strategies: the mixed task-guided feature enhancement, which directs feature extraction toward lesion-related details across scales, and the prompt-guided detail feature completion, which integrates coarse- and fine-grained features from WSI based on specific prompts without compromising inference speed. Leveraging a comprehensive dataset of 490,000 samples from diverse pathology tasks-including cancer detection, grading, vascular and neural invasion identification, and so on-we trained the pathology-specialized LVLM, OmniPath. Extensive experiments demonstrate that this model significantly outperforms existing methods in diagnostic accuracy and efficiency, offering an interactive, clinically aligned approach for auxiliary diagnosis in a wide range of pathology applications.

arxiv情報

著者	Shengxuming Zhang,Weihan Li,Tianhong Gao,Jiacong Hu,Haoming Luo,Mingli Song,Xiuming Zhang,Zunlei Feng
発行日	2024-12-12 18:07:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Clinical Pathology Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー