CausalGraph2LLM: Evaluating LLMs for Causal Queries

要約

科学研究では因果関係が不可欠であり、研究者が変数間の真の関係を解釈できるようにします。
これらの因果関係は、多くの場合、非環式グラフである因果グラフで表されます。
最近の大規模な言語モデル（LLMS）の進歩により、因果的推論の能力を調査することと、因果グラフを仮定する潜在的な使用に関心が高まっています。
これらのタスクでは、LLMが後続のダウンストリームタスクのために因果グラフを効果的にエンコードする必要があります。
このホワイトペーパーでは、LLMSの因果推論能力を評価するために、多様な因果グラフ設定にわたって70万クエリを超えるクエリを含む包括的なベンチマークであるCausalGraph2LLMを紹介します。
因果クエリを2つのタイプに分類します：グラフレベルとノードレベルのクエリ。
私たちは、私たちの研究のために、オープンソースと妥当性の両方のモデルの両方をベンチマークします。
私たちの調査結果は、LLMがこのドメインで有望である一方で、使用されるエンコーディングに非常に敏感であることを明らかにしています。
GPT-4やGemini-1.5などの有能なモデルでさえ、約60ドルの\％$の逸脱を伴うエンコーディングに対する感受性を示します。
さらに、下流の因果介入タスクに対するこの感度を示します。
さらに、LLMSは、パラメトリックメモリに由来する可能性のある因果グラフに関するコンテキスト情報を提示すると、しばしばバイアスを表示できることがわかります。

要約(オリジナル)

Causality is essential in scientific research, enabling researchers to interpret true relationships between variables. These causal relationships are often represented by causal graphs, which are directed acyclic graphs. With the recent advancements in Large Language Models (LLMs), there is an increasing interest in exploring their capabilities in causal reasoning and their potential use to hypothesize causal graphs. These tasks necessitate the LLMs to encode the causal graph effectively for subsequent downstream tasks. In this paper, we introduce CausalGraph2LLM, a comprehensive benchmark comprising over 700k queries across diverse causal graph settings to evaluate the causal reasoning capabilities of LLMs. We categorize the causal queries into two types: graph-level and node-level queries. We benchmark both open-sourced and propriety models for our study. Our findings reveal that while LLMs show promise in this domain, they are highly sensitive to the encoding used. Even capable models like GPT-4 and Gemini-1.5 exhibit sensitivity to encoding, with deviations of about $60\%$. We further demonstrate this sensitivity for downstream causal intervention tasks. Moreover, we observe that LLMs can often display biases when presented with contextual information about a causal graph, potentially stemming from their parametric memory.

arxiv情報

著者	Ivaxi Sheth,Bahare Fatemi,Mario Fritz
発行日	2025-02-18 17:19:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CausalGraph2LLM: Evaluating LLMs for Causal Queries

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー