Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums

要約

ここ数年、子供や青少年の精神的健康は着実に悪化しています。
最近の大規模言語モデル (LLM) の出現により、モニタリングと介入のコストと時間効率の高いスケーリングが期待されていますが、学校でのいじめや摂食障害などの特に蔓延している問題にもかかわらず、これまでの研究では、この領域やオープンな言語モデルにおけるパフォーマンスは調査されていませんでした。
一連の回答が事前に決定されていない場合の情報抽出。
次のカテゴリについて、専門の精神科医によって注釈が付けられた 12 歳から 19 歳の青少年の Reddit 投稿の新しいデータセットを作成します: トラウマ、不安定性、状態、症状、自殺傾向、治療、そして専門家のラベルを 2 つのトップパフォーマンス LLM (GPT3.5 および GPT3.5 および
GPT4）。
さらに、2 つの合成データセットを作成して、生成時にデータに注釈を付ける際に LLM のパフォーマンスが向上するかどうかを評価します。
GPT4 は人間のアノテーター間の合意と同等であり、合成データのパフォーマンスは大幅に高いことがわかりましたが、モデルは依然として否定と事実の問題で時々エラーが発生し、合成データのパフォーマンスの向上は実際のデータの複雑さによって促進されることがわかりました。
固有の利点ではなく、データを重視します。

要約(オリジナル)

Mental health in children and adolescents has been steadily deteriorating over the past few years. The recent advent of Large Language Models (LLMs) offers much hope for cost and time efficient scaling of monitoring and intervention, yet despite specifically prevalent issues such as school bullying and eating disorders, previous studies on have not investigated performance in this domain or for open information extraction where the set of answers is not predetermined. We create a new dataset of Reddit posts from adolescents aged 12-19 annotated by expert psychiatrists for the following categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT and compare expert labels to annotations from two top performing LLMs (GPT3.5 and GPT4). In addition, we create two synthetic datasets to assess whether LLMs perform better when annotating data as they generate it. We find GPT4 to be on par with human inter-annotator agreement and performance on synthetic data to be substantially higher, however we find the model still occasionally errs on issues of negation and factuality and higher performance on synthetic data is driven by greater complexity of real data rather than inherent advantage.

arxiv情報

著者	Isabelle Lorge,Dan W. Joyce,Andrey Kormilitzin
発行日	2024-04-26 11:36:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー