An evaluation of GPT models for phenotype concept recognition

要約

目的: 臨床的詳細表現型解析は、希少疾患を持つ患者の診断とケア調整計画の構築の両方において重要な役割を果たします。
このプロセスは、通常はヒト表現型オントロジーからのオントロジー概念を使用した患者プロファイルのモデリングとキュレーションに依存しています。
この表現型概念認識タスクをサポートするために、機械学習手法が広く採用されています。
ほとんどの NLP タスクで大規模言語モデル (LLM) の使用が大きく変化しているため、ここでは、臨床の詳細な表現型解析で ChatGPT を支える最新の生成事前トレーニング済みトランスフォーマー (GPT) モデルのパフォーマンスを検証します。
材料と方法: 研究の実験設定には、さまざまなレベルの特異性の 7 つのプロンプト、2 つの GPT モデル (gpt-3.5 および gpt-4.0)、および表現型認識の確立されたゴールドスタンダードが含まれていました。
結果: 私たちの結果は、現時点では、これらのモデルがまだ最先端のパフォーマンスを達成していないことを示しています。
数ショット学習を使用した最良の実行では、現在のクラス最高のツールで達成された F1 スコアが 0.62 であったのに対し、0.41 の F1 スコアが達成されました。
結論: 結果の非決定的な性質と、同じプロンプトと入力を使用した異なる実行間の一致の欠如により、臨床現場でのこれらの LLM の使用には問題があります。

要約(オリジナル)

Objective: Clinical deep phenotyping plays a critical role in both the diagnosis of patients with rare disorders as well as in building care coordination plans. The process relies on modelling and curating patient profiles using ontology concepts, usually from the Human Phenotype Ontology. Machine learning methods have been widely adopted to support this phenotype concept recognition task. With the significant shift in the use of large language models (LLMs) for most NLP tasks, herewithin, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT in clinical deep phenotyping. Materials and Methods: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5 and gpt-4.0) and an established gold standard for phenotype recognition. Results: Our results show that, currently, these models have not yet achieved state of the art performance. The best run, using few-shots learning, achieved 0.41 F1 score, compared to a 0.62 F1 score achieved by the current best in class tool. Conclusion: The non-deterministic nature of the outcomes and the lack of concordance between different runs using the same prompt and input makes the use of these LLMs in clinical settings problematic.

arxiv情報

著者	Tudor Groza,Harry Caufield,Dylan Gration,Gareth Baynam,Melissa A Haendel,Peter N Robinson,Chris J Mungall,Justin T Reese
発行日	2023-09-29 12:06:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An evaluation of GPT models for phenotype concept recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー