Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology

要約

この論文では、基礎ビジョン言語モデル（VLM）とスライドレベルの迅速な学習を利用することにより、組織病理学全体のスライド画像（WSI）における少数のショット分類の課題に対処します。
WSIのギガピクセルスケールを考えると、従来の複数インスタンス学習（MIL）メソッドは集約関数に依存して、トレーニングに広範なバッグレベルのラベルが必要なパッチ表現からスライドレベル（バッグレベル）予測を導き出します。
対照的に、VLMベースのアプローチは、パッチの視覚的な埋め込みを候補クラスのテキストプロンプトと整合させることに優れていますが、本質的な病理学的事前知識がありません。
私たちの方法は、言語モデルから病理学的事前知識を利用して、WSI分類のための重要な局所組織タイプ（パッチ）を特定し、VLMベースのMILフレームワーク内に統合することにより、それ自体を区別します。
当社のアプローチは、パッチ画像を組織タイプに効果的に揃えており、カテゴリごとにいくつかのラベル付けされたWSIのみを使用して、迅速な学習を介してモデルを微調整します。
実際の病理学的WSIデータセットとアブレーション研究に関する実験は、少ないショットWSI分類タスクでの既存のMILおよびVLMベースの方法よりも優れたパフォーマンスを強調しています。
私たちのコードは、https://github.com/lts5/slipで公開されています。

要約(オリジナル)

In this paper, we address the challenge of few-shot classification in histopathology whole slide images (WSIs) by utilizing foundational vision-language models (VLMs) and slide-level prompt learning. Given the gigapixel scale of WSIs, conventional multiple instance learning (MIL) methods rely on aggregation functions to derive slide-level (bag-level) predictions from patch representations, which require extensive bag-level labels for training. In contrast, VLM-based approaches excel at aligning visual embeddings of patches with candidate class text prompts but lack essential pathological prior knowledge. Our method distinguishes itself by utilizing pathological prior knowledge from language models to identify crucial local tissue types (patches) for WSI classification, integrating this within a VLM-based MIL framework. Our approach effectively aligns patch images with tissue types, and we fine-tune our model via prompt learning using only a few labeled WSIs per category. Experimentation on real-world pathological WSI datasets and ablation studies highlight our method’s superior performance over existing MIL- and VLM-based methods in few-shot WSI classification tasks. Our code is publicly available at https://github.com/LTS5/SLIP.

arxiv情報

著者	Devavrat Tomar,Guillaume Vray,Dwarikanath Mahapatra,Sudipta Roy,Jean-Philippe Thiran,Behzad Bozorgtabar
発行日	2025-03-21 15:40:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー