Boosting Theory-of-Mind Performance in Large Language Models via Prompting

要約

タイトル：Promptingによる大規模言語モデルにおけるTheory-of-Mindの性能向上

要約：
– 大規模言語モデル（LLM）は、多くのタスクで優れた成績を収めているが、複雑な推論には依然として課題がある。
– 人間の共通感覚的推論に必要な、エージェントの信念、目標、精神状態を理解するTheory-of-Mind（ToM）タスクは、この分野でLLMの性能向上が重要である。
– この研究では、GPT-4と3つのGPT-3.5の変種（Davinci-2、Davinci-3、GPT-3.5-Turbo）のToM性能を測定し、コンテキスト学習の有効性を調査した。
– 2回のチェーン・オフ・ソート推論とステップ・バイ・ステップの思考手順を特徴とするプロンプトを評価した。
– RLHF（人間のフィードバックによる強化学習）でトレーニングされたLLMs（Davinci-2を除くすべてのモデル）は、コンテキストの学習によってToM精度を向上させた。
– GPT-4は、ゼロショット設定で最も優れた成績を収め、ToM精度がほぼ80%に達したが、テストセットでの87%の人間の精度には及ばなかった。
– ただし、適切なプロンプトを提供すると、すべてのRLHFトレーニングされたLLMが80%のToM精度を上回り、GPT-4は100%に到達した。
– これらの結果は、適切なプロンプトがLLMのToM推論を向上させることを示しており、LLMの認知能力はコンテキストに依存することを強調している。

要約(オリジナル)

Large language models (LLMs) excel in many tasks in 2023, but they still face challenges in complex reasoning. Theory-of-mind (ToM) tasks, which require understanding agents’ beliefs, goals, and mental states, are essential for common-sense reasoning involving humans, making it crucial to enhance LLM performance in this area. This study measures the ToM performance of GPT-4 and three GPT-3.5 variants (Davinci-2, Davinci-3, GPT-3.5-Turbo), and investigates the effectiveness of in-context learning in improving their ToM comprehension. We evaluated prompts featuring two-shot chain of thought reasoning and step-by-step thinking instructions. We found that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) (all models excluding Davinci-2) improved their ToM accuracy via in-context learning. GPT-4 performed best in zero-shot settings, reaching nearly 80% ToM accuracy, but still fell short of the 87% human accuracy on the test set. However, when supplied with prompts for in-context learning, all RLHF-trained LLMs exceeded 80% ToM accuracy, with GPT-4 reaching 100%. These results demonstrate that appropriate prompting enhances LLM ToM reasoning, and they underscore the context-dependent nature of LLM cognitive capacities.

arxiv情報

著者	Shima Rahimi Moghaddam,Christopher J. Honey
発行日	2023-04-26 04:02:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Boosting Theory-of-Mind Performance in Large Language Models via Prompting

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー