On the Performance of an Explainable Language Model on PubMedQA

要約

大規模な言語モデル（LLM）は、医学的知識を取得し、それを推論し、医師に比例して医学的質問に答える上で重要な能力を示しています。
ただし、これらのモデルは解釈可能ではなく、幻覚、維持が困難であり、トレーニングと推論のために膨大な計算リソースを必要とします。
この論文では、PubMedQAデータセットの代替アーキテクチャに基づいた説明可能な言語モデルであるGyanの結果を報告します。
Gyan LLMは構成言語モデルであり、モデルは知識から切り離されています。
Gyanは信頼できる、透明であり、幻覚せず、重要なトレーニングやリソースの計算を必要としません。
Gyanはドメイン全体で簡単に転送できます。
Gyan-4.3は、GPT-4に基づいてMedPromptによる82％、Med-PALM 2（GoogleおよびDeepMind）に基づいて82％の精度でSOTAの結果を87.1％の精度で達成します。
将来、MEDQA、MEDMCQA、MMLUなど、他の医療データセットの結果を報告します。

要約(オリジナル)

Large language models (LLMs) have shown significant abilities in retrieving medical knowledge, reasoning over it and answering medical questions comparably to physicians. However, these models are not interpretable, hallucinate, are difficult to maintain and require enormous compute resources for training and inference. In this paper, we report results from Gyan, an explainable language model based on an alternative architecture, on the PubmedQA data set. The Gyan LLM is a compositional language model and the model is decoupled from knowledge. Gyan is trustable, transparent, does not hallucinate and does not require significant training or compute resources. Gyan is easily transferable across domains. Gyan-4.3 achieves SOTA results on PubmedQA with 87.1% accuracy compared to 82% by MedPrompt based on GPT-4 and 81.8% by Med-PaLM 2 (Google and DeepMind). We will be reporting results for other medical data sets – MedQA, MedMCQA, MMLU – Medicine in the future.

arxiv情報

著者	Venkat Srinivasan,Vishaal Jatav,Anushka Chandrababu,Geetika Sharma
発行日	2025-04-07 13:42:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On the Performance of an Explainable Language Model on PubMedQA

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー