Re-evaluating Theory of Mind evaluation in large language models

要約

大規模な言語モデル（LLM）が心の理論（TOM）を持っているかどうかという問題は、しばしば他人の精神状態について推論する能力として定義されていますが、重要な科学的および公共の関心を引き起こしました。
しかし、LLMSがTOMを所有しているかどうかの証拠は混在しており、最近の評価の成長は収束をもたらしていません。
ここでは、認知科学からインスピレーションを得て、LLMSのTOM評価の状態を再評価します。
LLMSがTOMを持っているかどうかについての意見の相違の主な理由は、モデルが人間の行動と一致すると予想されるべきか、それらの行動の根底にある計算が期待されるべきかどうかの明確さの欠如であると主張します。
また、現在の評価がTOM能力の「純粋な」測定から逸脱している方法を強調し、混乱にも貢献します。
TOMと実用的なコミュニケーションの関係を含む、将来の研究のためのいくつかの方向性について議論することで最後に、人工システムの理解と人間の認知を促進することができます。

要約(オリジナル)

The question of whether large language models (LLMs) possess Theory of Mind (ToM) — often defined as the ability to reason about others’ mental states — has sparked significant scientific and public interest. However, the evidence as to whether LLMs possess ToM is mixed, and the recent growth in evaluations has not resulted in a convergence. Here, we take inspiration from cognitive science to re-evaluate the state of ToM evaluation in LLMs. We argue that a major reason for the disagreement on whether LLMs have ToM is a lack of clarity on whether models should be expected to match human behaviors, or the computations underlying those behaviors. We also highlight ways in which current evaluations may be deviating from ‘pure’ measurements of ToM abilities, which also contributes to the confusion. We conclude by discussing several directions for future research, including the relationship between ToM and pragmatic communication, which could advance our understanding of artificial systems as well as human cognition.

arxiv情報

著者	Jennifer Hu,Felix Sosa,Tomer Ullman
発行日	2025-02-28 14:36:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Re-evaluating Theory of Mind evaluation in large language models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー