Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages

要約

ヘルスケアにおける人工知能の統合は、医療診断と患者ケアを改善するための新たな地平を切り開いた。しかし、特にリソースの少ない言語において、正確で文脈に即した放射線診断レポートを生成できるシステムの開発には課題が残る。本研究では、3つの低リソース言語にわたる放射線診断レポート生成という特殊なタスクにおいて、命令チューニングされた視覚言語モデル（VLM）の性能を評価するための包括的なベンチマークを提示する：イタリア語、ドイツ語、スペイン語である。LLaVAアーキテクチャフレームワークを採用し、一般的なデータセット、ドメイン固有のデータセット、および低リソース言語固有のデータセットを利用して、事前に訓練されたモデルの系統的な評価を実施した。医療領域と低リソース言語の両方に関する事前知識を持つモデルが存在しないことを考慮し、これらのコンテキストに最も効果的なアプローチを決定するために、様々な適応を分析した。その結果、言語特異的なモデルは、放射線医学レポートの生成において、一般的なモデルとドメイン特異的なモデルの両方を大幅に上回ることが明らかになり、言語適応の重要な役割が強調された。さらに、医療用語で微調整されたモデルは、一般的な知識を持つモデルと比較して、すべての言語においてより高い性能を示し、ドメインに特化したトレーニングの重要性が強調された。また、温度パラメータがレポート生成の一貫性に与える影響についても検討し、最適なモデル設定に関する知見を得た。本研究で得られた知見は、多言語環境における放射線医学レポートの品質と精度を向上させるためには、言語に合わせたトレーニングやドメインに特化したトレーニングが重要であることを浮き彫りにした。本研究は、医療におけるVLMの適応性に関する理解を深めるだけでなく、モデルのチューニングや言語固有の適応に関する今後の研究の重要な道筋を指し示すものである。

要約(オリジナル)

The integration of artificial intelligence in healthcare has opened new horizons for improving medical diagnostics and patient care. However, challenges persist in developing systems capable of generating accurate and contextually relevant radiology reports, particularly in low-resource languages. In this study, we present a comprehensive benchmark to evaluate the performance of instruction-tuned Vision-Language Models (VLMs) in the specialized task of radiology report generation across three low-resource languages: Italian, German, and Spanish. Employing the LLaVA architectural framework, we conducted a systematic evaluation of pre-trained models utilizing general datasets, domain-specific datasets, and low-resource language-specific datasets. In light of the unavailability of models that possess prior knowledge of both the medical domain and low-resource languages, we analyzed various adaptations to determine the most effective approach for these contexts. The results revealed that language-specific models substantially outperformed both general and domain-specific models in generating radiology reports, emphasizing the critical role of linguistic adaptation. Additionally, models fine-tuned with medical terminology exhibited enhanced performance across all languages compared to models with generic knowledge, highlighting the importance of domain-specific training. We also explored the influence of the temperature parameter on the coherence of report generation, providing insights for optimal model settings. Our findings highlight the importance of tailored language and domain-specific training for improving the quality and accuracy of radiological reports in multilingual settings. This research not only advances our understanding of VLMs adaptability in healthcare but also points to significant avenues for future investigations into model tuning and language-specific adaptations.

arxiv情報

著者	Marco Salmè,Rosa Sicilia,Paolo Soda,Valerio Guarrasi
発行日	2025-05-02 08:14:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー