GPT-4V Cannot Generate Radiology Reports Yet

要約

GPT-4V は強力なマルチモーダル機能があるとされており、放射線科レポート作成の自動化に GPT-4V を使用することに関心が集まっていますが、完全な評価は行われていません。
この研究では、MIMIC-CXR と IU X 線という 2 つの胸部 X 線レポートデータセットに関する放射線医学レポートを生成する際に、GPT-4V の体系的な評価を実行します。
私たちは、さまざまなプロンプト戦略を通じて GPT-4V を使用してレポートを直接生成しようとしましたが、語彙指標と臨床有効性指標の両方でひどく失敗することがわかりました。
パフォーマンスの低さを理解するために、タスクを 2 つのステップに分解します。1) 画像から病状ラベルを予測する医用画像推論ステップ。
2) (グラウンドトゥルース) 条件からレポートを生成するレポート合成ステップ。
画像推論における GPT-4V のパフォーマンスは、さまざまなプロンプトにわたって一貫して低いことがわかります。
実際、モデルが予測したラベルの分布は、画像上にどのようなグラウンドトゥルース条件が存在するかに関係なく一定のままであり、モデルが胸部 X 線写真を有意義に解釈していないことを示唆しています。
レポート合成でグラウンドトゥルース条件が与えられた場合でも、生成されるレポートは、微調整された LLaMA-2 よりも正確性が低く、不自然な響きになります。
まとめると、私たちの調査結果は、放射線科ワークフローにおける GPT-4V の使用の実現可能性に疑問を投げかけています。

要約(オリジナル)

GPT-4V’s purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V’s performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.

arxiv情報

著者	Yuyang Jiang,Chacha Chen,Dang Nguyen,Benjamin M. Mervak,Chenhao Tan
発行日	2024-10-09 15:23:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GPT-4V Cannot Generate Radiology Reports Yet

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー