AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator

要約

人工知能は、特に医療質問応答ベンチマークで優れた大規模言語モデル (LLM) を通じて、医療を大幅に進歩させました。
しかし、医師と患者のやり取りが複雑であるため、実際の臨床応用は依然として限られています。
これに対処するために、プレイヤーとしての \emph{医師} と \emph{患者}、\emph{検査者}、\emph{主治医を含む NPC との間の動的な医療相互作用をシミュレートするマルチエージェントフレームワークである \textbf{AI Hospital} を導入します。
}。
この設定により、臨床シナリオにおける LLM の現実的な評価が可能になります。
当社は、高品質の中国の医療記録と NPC を利用して、症状の収集、検査の推奨、診断における LLM のパフォーマンスを評価する、Multi-View Medical Evaluation (MVME) ベンチマークを開発しています。
さらに、反復的な議論を通じて診断の精度を高めるために、紛争解決の協力メカニズムが提案されています。
改良にもかかわらず、現在の LLM は、ワンステップアプローチと比較して、マルチターンインタラクションにおいてパフォーマンスに大きなギャップを示します。
私たちの調査結果は、これらのギャップを埋め、LLM の臨床診断能力を向上させるためのさらなる研究の必要性を浮き彫りにしています。
私たちのデータ、コード、実験結果はすべて \url{https://github.com/LibertFan/AI_Hospital} でオープンソース化されています。

要約(オリジナル)

Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions between \emph{Doctor} as player and NPCs including \emph{Patient}, \emph{Examiner}, \emph{Chief Physician}. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation (MVME) benchmark, utilizing high-quality Chinese medical records and NPCs to evaluate LLMs’ performance in symptom collection, examination recommendations, and diagnoses. Additionally, a dispute resolution collaborative mechanism is proposed to enhance diagnostic accuracy through iterative discussions. Despite improvements, current LLMs exhibit significant performance gaps in multi-turn interactions compared to one-step approaches. Our findings highlight the need for further research to bridge these gaps and improve LLMs’ clinical diagnostic capabilities. Our data, code, and experimental results are all open-sourced at \url{https://github.com/LibertFan/AI_Hospital}.

arxiv情報

著者	Zhihao Fan,Jialong Tang,Wei Chen,Siyuan Wang,Zhongyu Wei,Jun Xi,Fei Huang,Jingren Zhou
発行日	2024-06-28 03:11:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー