Towards Safe and Aligned Large Language Models for Medicine

要約

大規模言語モデル (LLM) の機能は驚くべきスピードで進歩しており、自社の開発者さえもその可能性とリスクの深さに直面しています。
一般知識向け LLM の安全性と整合性を評価するための初期段階が講じられ、いくつかの弱点が明らかになりましたが、私たちの知る限りでは、医療用 LLM の安全性と整合性は、個人の健康と安全、公衆衛生と安全に対するリスクにもかかわらず評価されていません。
、そして人権。
この目的のために、私たちは医療用 LLM の最初の安全性評価を実施します。
具体的には、医療用人工知能システムの医療安全性と整合性の定義を定め、LLM の医療安全性と整合性を評価するための有害な医学的質問のデータセットを開発し、一般的および医療用 LLM の医療安全性と整合性の両方を評価し、適切であることを実証します。
効果的な緩和戦略としてのチューニングについて説明し、機械学習コミュニティが安全で調整された LLM を開発するために使用する、より広範で大規模なアプローチについて議論します。
私たちは、この研究が医療用LLMの安全性と調整に光を当て、それを研究し、医療におけるLLMの危害のリスクを最小限に抑える追加の緩和戦略を開発する将来の研究の動機となることを願っています。

要約(オリジナル)

The capabilities of large language models (LLMs) have been progressing at a breathtaking speed, leaving even their own developers grappling with the depth of their potential and risks. While initial steps have been taken to evaluate the safety and alignment of general-knowledge LLMs, exposing some weaknesses, to our knowledge, the safety and alignment of medical LLMs has not been evaluated despite their risks for personal health and safety, public health and safety, and human rights. To this end, we carry out the first safety evaluation for medical LLMs. Specifically, we set forth a definition of medical safety and alignment for medical artificial intelligence systems, develop a dataset of harmful medical questions to evaluate the medical safety and alignment of an LLM, evaluate both general and medical safety and alignment of medical LLMs, demonstrate fine-tuning as an effective mitigation strategy, and discuss broader, large-scale approaches used by the machine learning community to develop safe and aligned LLMs. We hope that this work casts light on the safety and alignment of medical LLMs and motivates future work to study it and develop additional mitigation strategies, minimizing the risks of harm of LLMs in medicine.

arxiv情報

著者	Tessa Han,Aounon Kumar,Chirag Agarwal,Himabindu Lakkaraju
発行日	2024-03-06 14:34:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Safe and Aligned Large Language Models for Medicine

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー