A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

要約

最近、Large Language Model (LLM) は、幅広いタスクを解決する優れた機能を実証しました。
しかし、さまざまなタスクにわたって成功を収めているにもかかわらず、生物医学分野におけるそれらの能力を調査した先行研究はまだありません。
この目的を達成するために、この文書はベンチマーク生物医学タスクにおける LLM のパフォーマンスを評価することを目的としています。
この目的のために、26 のデータセットにわたる 6 つの多様な生物医学タスクにおける 4 つの人気のある LLM の包括的な評価を実施します。
私たちの知る限り、これは生物医学分野におけるさまざまな LLM の広範な評価と比較を行った最初の研究です。
興味深いことに、私たちの評価によると、トレーニングセットが小さい生物医学データセットでは、ゼロショット LLM が現在の最先端の微調整された生物医学モデルよりも優れたパフォーマンスを発揮することがわかりました。
これは、大規模なテキストコーパスでの事前トレーニングにより、LLM が生物医学領域でも非常に専門化されることを示唆しています。
また、すべてのタスクにおいて単一の LLM が他の LLM よりも優れたパフォーマンスを発揮できるわけではなく、LLM のパフォーマンスはタスクによって異なる可能性があることもわかりました。
大規模なトレーニングセットで微調整された生物医学モデルと比較すると、LLM のパフォーマンスは依然としてかなり劣っていますが、私たちの調査結果は、LLM が大規模な注釈付きデータが不足しているさまざまな生物医学タスクにとって貴重なツールになる可能性があることを示しています。

要約(オリジナル)

Recently, Large Language Models (LLM) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, we conduct a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.

arxiv情報

著者	Israt Jahan,Md Tahmid Rahman Laskar,Chun Peng,Jimmy Huang
発行日	2023-10-06 14:16:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー