MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks

要約

医学的視覚言語モデル（VLM）は、さまざまな医療分野で臨床助手として有望であることを示しています。
ただし、主に現在の皮膚科マルチモーダルデータセットの専門的なテキストの説明が少ないため、専門的かつ詳細な診断分析を提供できる専門的かつ詳細な診断分析を提供できる専門的な皮膚科VLMは未開発のままです。
この問題に対処するために、プロの教科書から収集された臨床、皮膚鏡、病理学的、病理学的、および10k近くの高品質の画像テキストペアを含む3つのイメージングモダリティを含む最初の大規模なマルチモーダル皮膚科データセットであるMMスキンを提案します。
さらに、27Kを超える多様な、命令に応えるビジョン質問応答（VQA）サンプル（現在の最大の皮膚科VQAデータセットの9倍のサイズ）を生成します。
パブリックデータセットとMMスキンを活用して、正確で微妙な皮膚疾患の解釈のために設計された皮膚科固有のVLMであるSkinVLを開発しました。
VQA上のSkinVLの包括的なベンチマーク評価、8つのデータセットにわたる監視付き微調整（SFT）およびゼロショット分類タスクは、一般的なVLMモデルと医療VLMモデルの両方と比較して、皮膚疾患の並外れた性能を明らかにします。
MM-SkinとSkinVLの導入は、臨床皮膚科VLMアシスタントの開発を進めることに意味のある貢献を提供します。
MM-Skinはhttps://github.com/zwq803/mm-skinで入手できます

要約(オリジナル)

Medical vision-language models (VLMs) have shown promise as clinical assistants across various medical fields. However, specialized dermatology VLM capable of delivering professional and detailed diagnostic analysis remains underdeveloped, primarily due to less specialized text descriptions in current dermatology multimodal datasets. To address this issue, we propose MM-Skin, the first large-scale multimodal dermatology dataset that encompasses 3 imaging modalities, including clinical, dermoscopic, and pathological and nearly 10k high-quality image-text pairs collected from professional textbooks. In addition, we generate over 27k diverse, instruction-following vision question answering (VQA) samples (9 times the size of current largest dermatology VQA dataset). Leveraging public datasets and MM-Skin, we developed SkinVL, a dermatology-specific VLM designed for precise and nuanced skin disease interpretation. Comprehensive benchmark evaluations of SkinVL on VQA, supervised fine-tuning (SFT) and zero-shot classification tasks across 8 datasets, reveal its exceptional performance for skin diseases in comparison to both general and medical VLM models. The introduction of MM-Skin and SkinVL offers a meaningful contribution to advancing the development of clinical dermatology VLM assistants. MM-Skin is available at https://github.com/ZwQ803/MM-Skin

arxiv情報

著者	Wenqi Zeng,Yuqi Sun,Chenxi Ma,Weimin Tan,Bo Yan
発行日	2025-05-09 16:03:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー