VideoPath-LLaVA: Pathology Diagnostic Reasoning Through Video Instruction Tuning

要約

3つの異なる画像シナリオ、シングルパッチ画像、自動的にキーフレーム抽出されたクリップ、および手動でセグメント化されたビデオ病理画像を統合する計算病理学の最初の大きなマルチモーダルモデル（LMM）であるVideopath-llavaを提示して、病理学者の自然な診断プロセスを模倣します。
詳細な組織学的記述を生成し、決定的なサインアウト診断に至ることにより、Videopath-llavaは診断推論を備えた視覚的な物語を橋渡しします。
私たちのアプローチの中心は、YouTube上の教育組織病理学ビデオから供給された4278のビデオおよび診断固有の考え方の指導ペアで構成されるVideopath-Instructデータセットです。
高品質のデータは診断推論を強化するために重要ですが、その作成は時間を集中しており、ボリュームが制限されています。
この課題を克服するために、既存のシングルイメージ命令データセットから知識を転送して、弱く注釈付きのキーフレーム抽出されたクリップでトレーニングし、その後、手動でセグメント化されたビデオで微調整します。
Videopath-Llavaは、病理学的なビデオ分析における新しいベンチマークを確立し、統合された視覚および診断の推論を通じて臨床的意思決定をサポートする将来のAIシステムの有望な基盤を提供します。
当社のコード、データ、モデルは、https：//github.com/trinhvg/videopath-llavaで公開されています。

要約(オリジナル)

We present VideoPath-LLaVA, the first large multimodal model (LMM) in computational pathology that integrates three distinct image scenarios, single patch images, automatically keyframe-extracted clips, and manually segmented video pathology images, to mimic the natural diagnostic process of pathologists. By generating detailed histological descriptions and culminating in a definitive sign-out diagnosis, VideoPath-LLaVA bridges visual narratives with diagnostic reasoning. Central to our approach is the VideoPath-Instruct dataset, comprising 4278 video and diagnosis-specific chain-of-thought instructional pairs sourced from educational histopathology videos on YouTube. Although high-quality data is critical for enhancing diagnostic reasoning, its creation is time-intensive and limited in volume. To overcome this challenge, we transfer knowledge from existing single-image instruction datasets to train on weakly annotated, keyframe-extracted clips, followed by fine-tuning on manually segmented videos. VideoPath-LLaVA establishes a new benchmark in pathology video analysis and offers a promising foundation for future AI systems that support clinical decision-making through integrated visual and diagnostic reasoning. Our code, data, and model are publicly available at https://github.com/trinhvg/VideoPath-LLaVA.

arxiv情報

著者	Trinh T. L. Vuong,Jin Tae Kwak
発行日	2025-05-07 07:41:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VideoPath-LLaVA: Pathology Diagnostic Reasoning Through Video Instruction Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー