Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models

要約

ヘルスケアにおける高度な臨床問題解決ツールの差し迫ったニーズに応えて、大規模言語モデル (LLM) に基づく新しいフレームワークである BooksMed を紹介します。
BooksMed は、人間の認知プロセスを独自にエミュレートして、証拠の強さを効果的に定量化する GRADE (推奨事項、評価、開発、評価の等級付け) フレームワークを利用して、証拠に基づいた信頼できる応答を提供します。
臨床上の意思決定が適切に評価されるためには、臨床的に整合され検証された評価指標が必要です。
解決策として、私たちは ExpertMedQA を紹介します。ExpertMedQA は、自由形式の専門家レベルの臨床質問で構成され、医療専門家の多様なグループによって検証された多専門分野の臨床ベンチマークです。
ExpertMedQA は、最新の臨床文献の深い理解と批判的評価を要求することで、LLM のパフォーマンスを厳格に評価します。
BooksMed は、さまざまな医療シナリオにおいて、既存の最先端モデル Med-PaLM 2、Almanac、ChatGPT を上回るパフォーマンスを発揮します。
したがって、人間の認知段階を模倣したフレームワークは、臨床上の問い合わせに対して信頼性が高く、証拠に基づいた回答を提供するための有用なツールとなる可能性があります。

要約(オリジナル)

In response to the pressing need for advanced clinical problem-solving tools in healthcare, we introduce BooksMed, a novel framework based on a Large Language Model (LLM). BooksMed uniquely emulates human cognitive processes to deliver evidence-based and reliable responses, utilizing the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework to effectively quantify evidence strength. For clinical decision-making to be appropriately assessed, an evaluation metric that is clinically aligned and validated is required. As a solution, we present ExpertMedQA, a multispecialty clinical benchmark comprised of open-ended, expert-level clinical questions, and validated by a diverse group of medical professionals. By demanding an in-depth understanding and critical appraisal of up-to-date clinical literature, ExpertMedQA rigorously evaluates LLM performance. BooksMed outperforms existing state-of-the-art models Med-PaLM 2, Almanac, and ChatGPT in a variety of medical scenarios. Therefore, a framework that mimics human cognitive stages could be a useful tool for providing reliable and evidence-based responses to clinical inquiries.

arxiv情報

著者	Khushboo Verma,Marina Moore,Stephanie Wottrich,Karla Robles López,Nishant Aggarwal,Zeel Bhatt,Aagamjit Singh,Bradford Unroe,Salah Basheer,Nitish Sachdeva,Prinka Arora,Harmanjeet Kaur,Tanupreet Kaur,Tevon Hood,Anahi Marquez,Tushar Varshney,Nanfu Deng,Azaan Ramani,Pawanraj Ishwara,Maimoona Saeed,Tatiana López Velarde Peña,Bryan Barksdale,Sushovan Guha,Satwant Kumar
発行日	2023-10-17 13:39:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー