Do Large Language Models Reason Causally Like Us? Even Better?

要約

因果推論は、知性のコアコンポーネントです。
大規模な言語モデル（LLM）は、人間のようなテキストを生成する際に印象的な能力を示しており、彼らの応答が真の理解または統計的パターンを反映しているかどうかについて疑問を提起しています。
コライダーグラフに基づいたタスクを使用して、人間の因果推論と4つのLLMを比較し、他の変数からの証拠が与えられたクエリ変数が発生する可能性を評価しました。
LLMSは、モデル、コンテキスト、およびタスクに基づいてアラインメントシフトを備えて、人間のようなものから規範的推論までのスペクトルに沿って因果的に推論されることがわかります。
全体として、GPT-4OとClaudeは「説明」を含む最も規範的な行動を示しましたが、Gemini-ProとGPT-3.5はそうではありませんでした。
すべてのエージェントは、予想される原因の独立性から逸脱しましたが、最小限には、その原因を考慮して効果の可能性を評価する際に、強い連想推論と予測推論を示しました。
これらの発見は、AIバイアスが人間の意思決定をますます助けるため、AIバイアスを評価する必要性を強調しています。

要約(オリジナル)

Causal reasoning is a core component of intelligence. Large language models (LLMs) have shown impressive capabilities in generating human-like text, raising questions about whether their responses reflect true understanding or statistical patterns. We compared causal reasoning in humans and four LLMs using tasks based on collider graphs, rating the likelihood of a query variable occurring given evidence from other variables. We find that LLMs reason causally along a spectrum from human-like to normative inference, with alignment shifting based on model, context, and task. Overall, GPT-4o and Claude showed the most normative behavior, including ‘explaining away’, whereas Gemini-Pro and GPT-3.5 did not. Although all agents deviated from the expected independence of causes – Claude the least – they exhibited strong associative reasoning and predictive inference when assessing the likelihood of the effect given its causes. These findings underscore the need to assess AI biases as they increasingly assist human decision-making.

arxiv情報

著者	Hanna M. Dettki,Brenden M. Lake,Charley M. Wu,Bob Rehder
発行日	2025-02-14 15:09:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Do Large Language Models Reason Causally Like Us? Even Better?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー