Measuring Psychological Depth in Language Models

要約

大規模言語モデル (LLM) によって生成されたクリエイティブストーリーの評価では、多くの場合、スタイル、一貫性、毒性などのテキストの客観的な特性に焦点が当てられます。
これらの指標は不可欠ではありますが、読者の観点から見た物語の主観的、心理的影響については語りません。
心理的深度スケール (PDS) を紹介します。これは文学理論に根ざした新しいフレームワークで、感情、共感、関与を引き起こす本物で物語的に複雑な物語を生み出す LLM の能力を測定します。
人間が PDS (0.72 クリッペンドルフのアルファ) に基づいてストーリーを一貫して評価できることを示すことで、フレームワークを経験的に検証します。
また、将来の分析を簡単に拡張できるように PDS を自動化する手法も検討します。
GPT-4o は、新しいペルソナ混合 (MoP) 促進戦略と組み合わせて、人間の判断との平均スピアマン相関 $0.51 を達成しました。一方、ラマ-3-70B は共感に関して 0.68 もの高いスコアを獲得しました。
最後に、人間と LLM の両方によって作成されたストーリーの深さを比較しました。
驚くべきことに、GPT-4 のストーリーは、Reddit から提供された高評価の人間が書いたストーリーを上回っていたか、統計的に区別がつきませんでした。
焦点をテキストから読者に移すことにより、心理的深度スケールは、LLM が語るストーリーを通じて人間とつながる能力を測定する検証済み、自動化された体系的な手段です。

要約(オリジナル)

Evaluations of creative stories generated by large language models (LLMs) often focus on objective properties of the text, such as its style, coherence, and toxicity. While these metrics are indispensable, they do not speak to a story’s subjective, psychological impact from a reader’s perspective. We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM’s ability to produce authentic and narratively complex stories that provoke emotion, empathy, and engagement. We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff’s alpha). We also explore techniques for automating the PDS to easily scale future analyses. GPT-4o, combined with a novel Mixture-of-Personas (MoP) prompting strategy, achieves an average Spearman correlation of $0.51$ with human judgment while Llama-3-70B scores as high as 0.68 for empathy. Finally, we compared the depth of stories authored by both humans and LLMs. Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit. By shifting the focus from text to reader, the Psychological Depth Scale is a validated, automated, and systematic means of measuring the capacity of LLMs to connect with humans through the stories they tell.

arxiv情報

著者	Fabrice Harel-Canada,Hanyu Zhou,Sreya Mupalla,Zeynep Yildiz,Amit Sahai,Nanyun Peng
発行日	2024-06-18 14:51:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Measuring Psychological Depth in Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー