GREEN: Generative Radiology Report Evaluation and Error Notation

要約

医療画像に関する正確な医療コミュニケーションが必要なため、事実の正確さが非常に重要であるため、放射線医学レポートの評価は困難な問題です。
既存の自動評価指標は、事実の正しさを考慮できていない（BLEU や ROUGE など）か、解釈可能性が制限されている（F1CheXpert や F1RadGraph など）かのいずれかです。
この論文では、言語モデルの自然言語理解を活用して候補レポートの臨床的に重大なエラーを定量的および定性的に特定して説明する放射線医学レポート生成指標である GREEN (Generative Radiology Report Evaluation and Error Notation) を紹介します。
現在の指標と比較して、GREEN は以下を提供します: 1) 専門家の好みに合わせたスコア、2) 臨床的に重大なエラーについて人間が解釈できる説明、エンドユーザーとのフィードバックループを可能にする、3) 商用のパフォーマンスに達する軽量のオープンソース手法
対応者。
GPT-4 と比較するだけでなく、6 人の専門家のエラー数や 2 人の専門家の好みと比較することで、GREEN メトリクスを検証します。
私たちの方法は、以前のアプローチと比較して、専門家のエラー数とのより高い相関性を実証するだけでなく、同時に専門家の好みとのより高い一致を示しています。

要約(オリジナル)

Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GREEN (Generative Radiology Report Evaluation and Error Notation), a radiology report generation metric that leverages the natural language understanding of language models to identify and explain clinically significant errors in candidate reports, both quantitatively and qualitatively. Compared to current metrics, GREEN offers: 1) a score aligned with expert preferences, 2) human interpretable explanations of clinically significant errors, enabling feedback loops with end-users, and 3) a lightweight open-source method that reaches the performance of commercial counterparts. We validate our GREEN metric by comparing it to GPT-4, as well as to error counts of 6 experts and preferences of 2 experts. Our method demonstrates not only higher correlation with expert error counts, but simultaneously higher alignment with expert preferences when compared to previous approaches.

arxiv情報

著者	Sophie Ostmeier,Justin Xu,Zhihong Chen,Maya Varma,Louis Blankemeier,Christian Bluethgen,Arne Edward Michalson,Michael Moseley,Curtis Langlotz,Akshay S Chaudhari,Jean-Benoit Delbrouck
発行日	2025-01-22 06:24:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GREEN: Generative Radiology Report Evaluation and Error Notation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー