Can Modern LLMs Act as Agent Cores in Radiology~Environments?

要約

大規模言語モデル (LLM) の進歩により、さまざまなドメインにわたって精度と解釈性が強化された LLM ベースのエージェントシステムへの道が開かれました。
複雑な分析要件がある放射線学は、これらの薬剤の応用に理想的な分野です。
この論文は、具体的な放射線学エージェントを構築するための前提条件となる質問、つまり「最新の LLM は放射線学環境でエージェントコアとして機能できるか?」を調査することを目的としています。
それを調査するために、3 つの貢献を備えた RadABench を紹介します。まず、6 つの解剖学的構造、5 つの画像モダリティ、10 のツールカテゴリ、および
11 の放射線タスク。
次に、プロンプト主導のワークフローと幅広い放射線科ツールセットをシミュレートする機能を備えたエージェント向けの新しい評価プラットフォームである RadABench-EvalPlat を提案します。
3 番目に、複数の指標を使用して 5 つの観点からベンチマーク上の 7 つの主要な LLM のパフォーマンスを評価します。
私たちの調査結果は、現在の LLM が多くの分野で強力な機能を実証している一方で、完全に稼働する放射線科エージェントシステムの中心エージェントコアとして機能するにはまだ十分に進歩していないことを示しています。
さらに、LLM ベースのエージェントコアのパフォーマンスに影響を与える主要な要因を特定し、実際の放射線診療現場でエージェントシステムを効果的に適用する方法に関する洞察を臨床医に提供します。
私たちのコードとデータはすべて、https://github.com/MAGIC-AI4Med/RadABench でオープンソース化されています。

要約(オリジナル)

Advancements in large language models (LLMs) have paved the way for LLM-based agent systems that offer enhanced accuracy and interpretability across various domains. Radiology, with its complex analytical requirements, is an ideal field for the application of these agents. This paper aims to investigate the pre-requisite question for building concrete radiology agents which is, `Can modern LLMs act as agent cores in radiology environments?’ To investigate it, we introduce RadABench with three-fold contributions: First, we present RadABench-Data, a comprehensive synthetic evaluation dataset for LLM-based agents, generated from an extensive taxonomy encompassing 6 anatomies, 5 imaging modalities, 10 tool categories, and 11 radiology tasks. Second, we propose RadABench-EvalPlat, a novel evaluation platform for agents featuring a prompt-driven workflow and the capability to simulate a wide range of radiology toolsets. Third, we assess the performance of 7 leading LLMs on our benchmark from 5 perspectives with multiple metrics. Our findings indicate that while current LLMs demonstrate strong capabilities in many areas, they are still not sufficiently advanced to serve as the central agent core in a fully operational radiology agent system. Additionally, we identify key factors influencing the performance of LLM-based agent cores, offering insights for clinicians on how to apply agent systems in real-world radiology practices effectively. All of our code and data are open-sourced in https://github.com/MAGIC-AI4Med/RadABench.

arxiv情報

著者	Qiaoyu Zheng,Chaoyi Wu,Pengcheng Qiu,Lisong Dai,Ya Zhang,Yanfeng Wang,Weidi Xie
発行日	2024-12-12 18:20:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can Modern LLMs Act as Agent Cores in Radiology~Environments?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー