Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

要約

大規模言語モデル (LLM) は、優れたタスク解決能力を実証しており、人間のようなアシスタントに似た役割を採用することが増えています。
LLM の社会へのより広範な統合により、LLM が心理的特性を示すかどうか、またこれらの特性が安定した調査であり、LLM の行動への理解が深まるかどうかについての関心が高まっています。
心理測定学に触発されたこの論文では、心理的側面の特定、評価データセットのキュレーション、結果の検証を伴う評価など、LLM における心理学を調査するためのフレームワークを紹介します。
このフレームワークに従って、性格、価値観、感情、心の理論、モチベーション、知性という 6 つの心理的側面をカバーする LLM の包括的な心理測定ベンチマークを導入します。
このベンチマークには、さまざまなシナリオと項目タイプを特徴とする 13 のデータセットが含まれています。
私たちの調査結果は、LLM が広範な心理的特性を示すことを示しています。
また、LLM の自己報告された特性と、現実世界のシナリオにおける LLM の行動との間の矛盾も明らかにします。
この論文では、LLM の徹底的な心理測定評価を実証し、信頼できる評価と AI および社会科学における潜在的な応用についての洞察を提供します。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable-inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this paper presents a framework for investigating psychology in LLMs, including psychological dimension identification, assessment dataset curation, and assessment with results validation. Following this framework, we introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence. This benchmark includes thirteen datasets featuring diverse scenarios and item types. Our findings indicate that LLMs manifest a broad spectrum of psychological attributes. We also uncover discrepancies between LLMs’ self-reported traits and their behaviors in real-world scenarios. This paper demonstrates a thorough psychometric assessment of LLMs, providing insights into reliable evaluation and potential applications in AI and social sciences.

arxiv情報

著者	Yuan Li,Yue Huang,Hongyi Wang,Xiangliang Zhang,James Zou,Lichao Sun
発行日	2024-06-25 16:09:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー