CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models

要約

因果推論を実行する能力は、知能の中核機能であると広く考えられています。
この研究では、大規模言語モデル (LLM) が因果関係について一貫して推論できるかどうかを調査します。
自然言語処理 (NLP) における既存の研究の多くは、LLM における常識的な因果推論の評価に焦点を当てているため、モデルが明確に定義された形式的なルールのセットに従って因果推論を実行できるかどうかを評価できません。
これに対処するために、Judea Pearl らによって仮説された「因果推論エンジン」に触発された、新しい NLP タスクである自然言語での因果推論を提案します。
私たちは、10,000 個のサンプルを含む大規模なデータセット CLadder を構成します。因果関係のグラフとクエリ (連想、介入、反事実) のコレクションに基づいて、オラクルの因果推論エンジンを通じて、象徴的な質問と真実の答えを取得します。
これらはその後、自然言語に翻訳されます。
データセット上の複数の LLM を評価し、オーダーメイドの思考連鎖促進戦略 CausalCoT を導入して評価します。
私たちのタスクは LLM にとって非常に困難であることを示し、LLM の因果推論能力についてより深い洞察を得るために詳細な分析を実施します。
私たちのデータは https://huggingface.co/datasets/causalNLP/cladder でオープンソース化されており、コードは https://github.com/causalNLP/cladder で見つけることができます。

要約(オリジナル)

The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordance with a set of well-defined formal rules. To address this, we propose a new NLP task, causal inference in natural language, inspired by the ‘causal inference engine’ postulated by Judea Pearl et al. We compose a large dataset, CLadder, with 10K samples: based on a collection of causal graphs and queries (associational, interventional, and counterfactual), we obtain symbolic questions and ground-truth answers, through an oracle causal inference engine. These are then translated into natural language. We evaluate multiple LLMs on our dataset, and we introduce and evaluate a bespoke chain-of-thought prompting strategy, CausalCoT. We show that our task is highly challenging for LLMs, and we conduct an in-depth analysis to gain deeper insight into the causal reasoning abilities of LLMs. Our data is open-sourced at https://huggingface.co/datasets/causalNLP/cladder, and our code can be found at https://github.com/causalNLP/cladder.

arxiv情報

著者	Zhijing Jin,Yuen Chen,Felix Leeb,Luigi Gresele,Ojasv Kamal,Zhiheng Lyu,Kevin Blin,Fernando Gonzalez Adauto,Max Kleiman-Weiner,Mrinmaya Sachan,Bernhard Schölkopf
発行日	2023-12-07 15:12:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー