GREEN: Generative Radiology Report Evaluation and Error Notation

要約

医療画像に関する正確な医療コミュニケーションの必要性から、事実の正しさが非常に重要であるため、放射線医学レポートの評価は困難な問題である。既存の自動評価メトリクスは、BLEUやROUGEなどの事実の正しさを考慮できないか、F1CheXpertやF1RadGraphなどの解釈可能性に限界がある。本論文では、GREEN（Generative Radiology Report Evaluation and Error Notation）を紹介する。GREENは、言語モデルの自然言語理解を活用し、定量的かつ定性的に、候補レポートにおける臨床的に重要なエラーを特定し、説明する放射線レポート生成メトリクスである。現在の評価指標と比較して、GREENは以下を提供します：1)専門家の嗜好に沿ったスコア、2)エンドユーザーとのフィードバックループを可能にする、臨床的に重要なエラーの人間による解釈可能な説明、3)商業的な同等の性能に達する、軽量のオープンソース手法。我々は、GPT-4、6人の専門家のエラーカウント、2人の専門家の嗜好と比較することで、我々のGREENメトリックを検証した。我々の方法は、従来のアプローチと比較して、専門家のエラーカウントと高い相関を示すだけでなく、同時に専門家の好みと高い整合性を示す。

要約(オリジナル)

Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GREEN (Generative Radiology Report Evaluation and Error Notation), a radiology report generation metric that leverages the natural language understanding of language models to identify and explain clinically significant errors in candidate reports, both quantitatively and qualitatively. Compared to current metrics, GREEN offers: 1) a score aligned with expert preferences, 2) human interpretable explanations of clinically significant errors, enabling feedback loops with end-users, and 3) a lightweight open-source method that reaches the performance of commercial counterparts. We validate our GREEN metric by comparing it to GPT-4, as well as to error counts of 6 experts and preferences of 2 experts. Our method demonstrates not only higher correlation with expert error counts, but simultaneously higher alignment with expert preferences when compared to previous approaches.’

arxiv情報

著者	Sophie Ostmeier,Justin Xu,Zhihong Chen,Maya Varma,Louis Blankemeier,Christian Bluethgen,Arne Edward Michalson,Michael Moseley,Curtis Langlotz,Akshay S Chaudhari,Jean-Benoit Delbrouck
発行日	2024-05-06 16:04:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

GREEN: Generative Radiology Report Evaluation and Error Notation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー