XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

要約

BardやGPT-4などの大規模なビジョン言語モデルの最新のブレークスルーは、幅広いタスクを実行する際に並外れた能力を示しています。
このようなモデルは、多様なタスクを備えた数十億のパブリックイメージテキストペアで構成される大規模なデータセットでトレーニングされています。
ただし、放射線学などのタスク固有のドメインでのパフォーマンスは、生物医学的画像を理解する際の洗練度がないため、調査を受けていない可能性があり、潜在的に制限されています。
一方、会話の医療モデルは驚くべき成功を示していますが、主にテキストベースの分析に焦点を当てています。
このペーパーでは、胸部X線写真に関する自由回答形式の質問を分析および回答できる新しい会話型の医療視覚言語モデルであるXrayGPTを紹介します。
具体的には、単純な線形変換を使用して、両方のMedical Visual Encoder（MedClip）を微調整した大手言語モデル（Vicuna）に揃えます。
このアラインメントにより、私たちのモデルは、X線写真や医療ドメインの知識の深い理解に基づいた、例外的な視覚的な会話能力を持つことができます。
医療コンテキストでのLLMSのパフォーマンスを向上させるために、フリーテキスト放射線レポートから約217Kインタラクティブおよび高品質の要約を生成します。
これらの要約は、微調整プロセスを通じてLLMSのパフォーマンスを向上させるのに役立ちます。
私たちのアプローチは、胸部レントゲン写真の自動分析を進めるための新しい手段を開きます。
オープンソースのデモ、モデル、および命令セットは、https：//github.com/mbzuai-oryx/xarygptで入手できます。

要約(オリジナル)

The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks. Such models are trained on massive datasets comprising billions of public image-text pairs with diverse tasks. However, their performance on task-specific domains, such as radiology, is still under-investigated and potentially limited due to a lack of sophistication in understanding biomedical images. On the other hand, conversational medical models have exhibited remarkable success but have mainly focused on text-based analysis. In this paper, we introduce XrayGPT, a novel conversational medical vision-language model that can analyze and answer open-ended questions about chest radiographs. Specifically, we align both medical visual encoder (MedClip) with a fine-tuned large language model (Vicuna), using a simple linear transformation. This alignment enables our model to possess exceptional visual conversation abilities, grounded in a deep understanding of radiographs and medical domain knowledge. To enhance the performance of LLMs in the medical context, we generate ~217k interactive and high-quality summaries from free-text radiology reports. These summaries serve to enhance the performance of LLMs through the fine-tuning process. Our approach opens up new avenues the research for advancing the automated analysis of chest radiographs. Our open-source demos, models, and instruction sets are available at: https://github.com/mbzuai-oryx/XrayGPT.

arxiv情報

著者	Omkar Thawakar,Abdelrahman Shaker,Sahal Shaji Mullappilly,Hisham Cholakkal,Rao Muhammad Anwer,Salman Khan,Jorma Laaksonen,Fahad Shahbaz Khan
発行日	2025-05-07 14:26:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー