Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

要約

大規模言語モデル (LLM) は、人間との対話において顕著な熟練度を示していますが、医療分野での応用についてはまだ十分に検討されていません。
これまでの研究は主に検査による医学知識のパフォーマンスに焦点を当てていましたが、これは現実的なシナリオからは程遠く、臨床課題におけるLLMの能力を評価するには不十分でした。
ヘルスケアにおける大規模言語モデル (LLM) の適用を強化するという目的で、この文書では、従来の LLM 評価と微妙な要求の間のギャップを対象として、自動対話型評価 (AIE) フレームワークと状態認識患者シミュレーター (SAPS) を紹介します。
臨床実践の。
静的な医学知識の評価に依存する従来の方法とは異なり、AIE と SAPS は、複数ターンの医師と患者のシミュレーションを通じて LLM を評価するための動的で現実的なプラットフォームを提供します。
このアプローチは、実際の臨床シナリオにより近い近似を提供し、複雑な患者の相互作用に応じた LLM の動作の詳細な分析を可能にします。
私たちの広範な実験検証は、AIE フレームワークの有効性を実証し、人間の評価とよく一致する結果をもたらし、医療提供を改善するために医療 LLM テストに革命をもたらす可能性を強調しています。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored. Previous works mainly focus on the performance of medical knowledge with examinations, which is far from the realistic scenarios, falling short in assessing the abilities of LLMs on clinical tasks. In the quest to enhance the application of Large Language Models (LLMs) in healthcare, this paper introduces the Automated Interactive Evaluation (AIE) framework and the State-Aware Patient Simulator (SAPS), targeting the gap between traditional LLM evaluations and the nuanced demands of clinical practice. Unlike prior methods that rely on static medical knowledge assessments, AIE and SAPS provide a dynamic, realistic platform for assessing LLMs through multi-turn doctor-patient simulations. This approach offers a closer approximation to real clinical scenarios and allows for a detailed analysis of LLM behaviors in response to complex patient interactions. Our extensive experimental validation demonstrates the effectiveness of the AIE framework, with outcomes that align well with human evaluations, underscoring its potential to revolutionize medical LLM testing for improved healthcare delivery.

arxiv情報

著者	Yusheng Liao,Yutong Meng,Yuhao Wang,Hongcheng Liu,Yanfeng Wang,Yu Wang
発行日	2024-03-14 08:05:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー