Do personality tests generalize to Large Language Models?

要約

大規模言語モデル (LLM) がテキストベースの対話においてますます人間らしく動作するようになってきたため、本来は人間向けに設計されたテストを使用して、これらのモデルのさまざまな特性を評価しようとすることが一般的になりました。
既存のテストを再利用することは、LLM を評価するためのリソース効率の高い方法ですが、テスト結果が人間の部分集団全体にわたって有効であることを確認するには、通常、慎重な調整が必要です。
したがって、さまざまなテストの有効性が LLM にどの程度まで一般化するかは明らかではありません。
この研究では、性格テストに対するLLMの反応が典型的な人間の反応から体系的に逸脱しているという証拠を提供し、これらの結果が人間のテスト結果と同じように解釈できないことを示唆しています。
具体的には、逆コード化された項目 (例: 「私は内向的です」と「私は外向的です」) は、LLM によって両方とも肯定的に回答されることがよくあります。
さらに、特定の性格タイプをシミュレートするように LLM を「誘導」するように設計されたさまざまなプロンプト間の変動は、人間のサンプルからの 5 つの独立した性格因子への明確な分離に従っていません。
これらの結果を考慮すると、LLM の「性格」のような潜在的に不明確な概念について強力な結論を導く前に、LLM に対するテストの妥当性により多くの注意を払うことが重要であると考えられます。

要約(オリジナル)

With large language models (LLMs) appearing to behave increasingly human-like in text-based interactions, it has become popular to attempt to evaluate various properties of these models using tests originally designed for humans. While re-using existing tests is a resource-efficient way to evaluate LLMs, careful adjustments are usually required to ensure that test results are even valid across human sub-populations. Thus, it is not clear to what extent different tests’ validity generalizes to LLMs. In this work, we provide evidence that LLMs’ responses to personality tests systematically deviate from typical human responses, implying that these results cannot be interpreted in the same way as human test results. Concretely, reverse-coded items (e.g. ‘I am introverted’ vs ‘I am extraverted’) are often both answered affirmatively by LLMs. In addition, variation across different prompts designed to ‘steer’ LLMs to simulate particular personality types does not follow the clear separation into five independent personality factors from human samples. In light of these results, we believe it is important to pay more attention to tests’ validity for LLMs before drawing strong conclusions about potentially ill-defined concepts like LLMs’ ‘personality’.

arxiv情報

著者	Florian E. Dorner,Tom Sühr,Samira Samadi,Augustin Kelava
発行日	2023-11-09 11:54:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Do personality tests generalize to Large Language Models?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー