WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation

要約

マルチモーダル/ビジョン言語モデル (VLM) は世界中の医療現場で導入されることが増えており、その安全性、有効性、公平性を確保するための堅牢なベンチマークが必要になっています。
国家健康診断から得られる多肢選択式の質問と回答 (QA) データセットは、長い間、貴重な評価ツールとして機能してきましたが、既存のデータセットは主にテキストのみであり、利用できる言語と国の限られたサブセットです。
これらの課題に対処するために、ヘルスケアにおける VLM を評価するために設計された最新の多言語、マルチモーダルベンチマークデータセットである WorldMedQA-V を紹介します。
WorldMedQA-V には、4 か国 (ブラジル、イスラエル、日本、スペイン) の 568 枚の医療画像と組み合わせた 568 個のラベル付き多肢選択 QA が含まれており、それぞれ原語と母国語の臨床医による検証済みの英語翻訳がカバーされています。
一般的なオープンソースモデルとクローズドソースモデルのベースラインパフォーマンスは、モデルに提供される画像の有無にかかわらず、ローカル言語と英語の翻訳で提供されます。
WorldMedQA-V ベンチマークは、AI システムを導入先の多様な医療環境に適合させ、より公平で効果的で代表的なアプリケーションを促進することを目的としています。

要約(オリジナル)

Multimodal/vision language models (VLMs) are increasingly being deployed in healthcare settings worldwide, necessitating robust benchmarks to ensure their safety, efficacy, and fairness. Multiple-choice question and answer (QA) datasets derived from national medical examinations have long served as valuable evaluation tools, but existing datasets are largely text-only and available in a limited subset of languages and countries. To address these challenges, we present WorldMedQA-V, an updated multilingual, multimodal benchmarking dataset designed to evaluate VLMs in healthcare. WorldMedQA-V includes 568 labeled multiple-choice QAs paired with 568 medical images from four countries (Brazil, Israel, Japan, and Spain), covering original languages and validated English translations by native clinicians, respectively. Baseline performance for common open- and closed-source models are provided in the local language and English translations, and with and without images provided to the model. The WorldMedQA-V benchmark aims to better match AI systems to the diverse healthcare environments in which they are deployed, fostering more equitable, effective, and representative applications.

arxiv情報

著者	João Matos,Shan Chen,Siena Placino,Yingya Li,Juan Carlos Climent Pardo,Daphna Idan,Takeshi Tohyama,David Restrepo,Luis F. Nakayama,Jose M. M. Pascual-Leone,Guergana Savova,Hugo Aerts,Leo A. Celi,A. Ian Wong,Danielle S. Bitterman,Jack Gallifant
発行日	2024-10-16 16:31:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー