CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

要約

人工知能は、特に Medical Large Vision Language Model (Med-LVLM) の出現により、医療アプリケーションに大きな影響を与え、自動化および個別化された医療の将来に対する楽観的な見方を引き起こしました。
ただし、Med-LVLM の信頼性は未検証のままであり、将来のモデル展開に重大なリスクをもたらします。
本稿ではCARESを紹介し、医療領域全体にわたるMed-LVLMの信頼性を包括的に評価することを目的としています。
当社は、信頼性、公平性、安全性、プライバシー、堅牢性を含む 5 つの側面にわたって Med-LVLM の信頼性を評価します。
CARES は、クローズド形式とオープンエンド形式の両方で約 41,000 の質問と回答のペアで構成され、16 の医用画像モダリティと 27 の解剖学的領域をカバーします。
私たちの分析により、モデルは信頼性に関して一貫して懸念を示しており、しばしば事実の不正確さを示し、異なる人口統計グループ間で公平性を維持できていないことが明らかになりました。
さらに、攻撃に対して脆弱であり、プライバシー意識の欠如を示しています。
ベンチマークとコードは https://cares-ai.github.io/ で公開されています。

要約(オリジナル)

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://cares-ai.github.io/.

arxiv情報

著者	Peng Xia,Ze Chen,Juanxi Tian,Yangrui Gong,Ruibo Hou,Yue Xu,Zhenbang Wu,Zhiyuan Fan,Yiyang Zhou,Kangyu Zhu,Wenhao Zheng,Zhaoyang Wang,Xiao Wang,Xuchao Zhang,Chetan Bansal,Marc Niethammer,Junzhou Huang,Hongtu Zhu,Yun Li,Jimeng Sun,Zongyuan Ge,Gang Li,James Zou,Huaxiu Yao
発行日	2024-10-30 17:08:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー