Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework

要約

大規模な言語モデル（LLM）は、医療質問（QA）シナリオでますます採用されています。
ただし、LLMは幻覚と非事実情報を生成し、ハイステークスの医療タスクでの信頼性を損なう可能性があります。
適合予測（CP）は、限界（平均）カバレッジ保証の統計的に厳密なフレームワークを提供しますが、医療QAの調査は限られています。
このペーパーでは、医療複数選択質問（MCQA）タスクのための強化されたCPフレームワークを提案します。
不適合スコアを正しいオプションの周波数スコアに関連付け、自己整合性を活用することにより、フレームワークは内部モデルの不透明度に対処し、単調な損失関数を備えたリスク制御戦略を組み込みます。
MedMCQA、MEDQA、およびMMLUデータセットで4つの既製のLLMSを使用して評価された提案方法は、指定されたエラー率保証を満たし、リスクレベルを上げる平均予測セットサイズを減らし、LLMSの有望な不確実性評価メトリックを提供します。

要約(オリジナル)

Large language models (LLMs) are increasingly adopted in medical question-answering (QA) scenarios. However, LLMs can generate hallucinations and nonfactual information, undermining their trustworthiness in high-stakes medical tasks. Conformal Prediction (CP) provides a statistically rigorous framework for marginal (average) coverage guarantees but has limited exploration in medical QA. This paper proposes an enhanced CP framework for medical multiple-choice question-answering (MCQA) tasks. By associating the non-conformance score with the frequency score of correct options and leveraging self-consistency, the framework addresses internal model opacity and incorporates a risk control strategy with a monotonic loss function. Evaluated on MedMCQA, MedQA, and MMLU datasets using four off-the-shelf LLMs, the proposed method meets specified error rate guarantees while reducing average prediction set size with increased risk level, offering a promising uncertainty evaluation metric for LLMs.

arxiv情報

著者	Yusong Ke,Hongru Lin,Yuting Ruan,Junya Tang,Li Li
発行日	2025-05-08 16:52:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー