SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

要約

計算病理学においてマルチモーダルラージランゲージモデル（MLLMS）が行った進展にもかかわらず、パッチレベルの分析に主に焦点を当て、スライドレベル全体で重要なコンテキスト情報が欠落しています。
大規模な命令データセットの欠如とスライド画像全体（WSI）のギガピクセルスケールは、重大な発達上の課題をもたらします。
このペーパーでは、ギガピクセルの全面画像を理解できる最初のビジョン言語アシスタントであるSlideChatを提示し、多様な病理学シナリオ全体で優れたマルチモーダルの会話機能と応答複雑な指導を示します。
その開発をサポートするために、4.2K WSIキャプションと複数のカテゴリを持つ176K VQAペアで構成されるWSIの最大の命令検索データセットであるSlideintructionを作成しました。
さらに、顕微鏡、診断などのさまざまな臨床設定でSlideChatの機能を評価するためにキャプションとVQAタスクを組み込んだマルチモーダルベンチマークであるSlideBenchを提案します。
一般的なMLLMおよび専門的なMLLMの両方と比較して、SlideChatは22のタスクのうち18で最先端のパフォーマンスを達成する特別な機能を示しています。
たとえば、Slidebench-VQA（TCGA）で81.17％、Slidebench-VQA（BCNB）で54.15％の全体的な精度を達成しました。
私たちのコード、データ、モデルは、https：//uni-medical.github.io/slidechat.github.ioで公開されています。

要約(オリジナル)

Despite the progress made by multimodal large language models (MLLMs) in computational pathology, they remain limited by a predominant focus on patch-level analysis, missing essential contextual information at the whole-slide level. The lack of large-scale instruction datasets and the gigapixel scale of whole slide images (WSIs) pose significant developmental challenges. In this paper, we present SlideChat, the first vision-language assistant capable of understanding gigapixel whole-slide images, exhibiting excellent multimodal conversational capability and response complex instruction across diverse pathology scenarios. To support its development, we created SlideInstruction, the largest instruction-following dataset for WSIs consisting of 4.2K WSI captions and 176K VQA pairs with multiple categories. Furthermore, we propose SlideBench, a multimodal benchmark that incorporates captioning and VQA tasks to assess SlideChat’s capabilities in varied clinical settings such as microscopy, diagnosis. Compared to both general and specialized MLLMs, SlideChat exhibits exceptional capabilities achieving state-of-the-art performance on 18 of 22 tasks. For example, it achieved an overall accuracy of 81.17% on SlideBench-VQA (TCGA), and 54.15% on SlideBench-VQA (BCNB). Our code, data, and model is publicly accessible at https://uni-medical.github.io/SlideChat.github.io.

arxiv情報

著者	Ying Chen,Guoan Wang,Yuanfeng Ji,Yanjun Li,Jin Ye,Tianbin Li,Ming Hu,Rongshan Yu,Yu Qiao,Junjun He
発行日	2025-03-19 17:56:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー