ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification

要約

推論が強化された大規模な言語モデル（LLMS）とマルチモーダルLLMS（MLLM）の最近の進歩により、複雑なタスクのパフォーマンスが大幅に向上しましたが、医療AIモデルはしばしば臨床診療に固有の構造化された推論プロセスを見落としています。
この作業では、ChestX-Reasonerを提示します。これは、臨床報告から直接採掘されたプロセス監督を活用するように設計された放射線科診断MLLMであり、段階的な推論とそれに続く放射線科医が反映しています。
日常的な放射線レポートから推論チェーンを抽出および改良することにより、大きなデータセットを構築します。
2段階のトレーニングフレームワークは、モデルの推論を臨床基準とより適切に調整するために、プロセスの報酬によって導かれた監視された微調整と強化の学習を組み合わせています。
Radrbench-CXRを紹介します。これは、301kの臨床的に検証された推論ステップを備えた59kの視覚的質問サンプルを特徴とする包括的なベンチマークを紹介し、Radrscoreを提案します。
Chestx-Reasonerは、診断精度と推論能力の両方で既存の医療および一般的なドメインMLLMを上回り、それぞれ最高の医療MLLM、最高の一般MLLM、およびその基本モデルと比較して、16％、5.9％、および18％の改善を達成します。
すべてのリソースは、MLLMSの医学的推論に関するさらなる研究を促進するためにオープンソーリングされています。

要約(オリジナル)

Recent advances in reasoning-enhanced large language models (LLMs) and multimodal LLMs (MLLMs) have significantly improved performance in complex tasks, yet medical AI models often overlook the structured reasoning processes inherent in clinical practice. In this work, we present ChestX-Reasoner, a radiology diagnosis MLLM designed to leverage process supervision mined directly from clinical reports, reflecting the step-by-step reasoning followed by radiologists. We construct a large dataset by extracting and refining reasoning chains from routine radiology reports. Our two-stage training framework combines supervised fine-tuning and reinforcement learning guided by process rewards to better align model reasoning with clinical standards. We introduce RadRBench-CXR, a comprehensive benchmark featuring 59K visual question answering samples with 301K clinically validated reasoning steps, and propose RadRScore, a metric evaluating reasoning factuality, completeness, and effectiveness. ChestX-Reasoner outperforms existing medical and general-domain MLLMs in both diagnostic accuracy and reasoning ability, achieving 16%, 5.9%, and 18% improvements in reasoning ability compared to the best medical MLLM, the best general MLLM, and its base model, respectively, as well as 3.3%, 24%, and 27% improvements in outcome accuracy. All resources are open-sourced to facilitate further research in medical reasoning MLLMs.

arxiv情報

著者	Ziqing Fan,Cheng Liang,Chaoyi Wu,Ya Zhang,Yanfeng Wang,Weidi Xie
発行日	2025-04-29 16:48:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー