CELLO: Causal Evaluation of Large Vision-Language Models

要約

因果推論は人間の知性の基礎であり、現実世界の環境で効果的な意思決定を行うために不可欠です。
大規模視覚言語モデル (LVLM) の最近の進歩にもかかわらず、因果関係を理解する能力は依然として不明瞭です。
これまでの研究は通常、イベントやアクションの間の常識的な因果関係に焦点を当てていましたが、これは身体化されたエージェントのようなアプリケーションには不十分であり、正式な因果推論に必要な明示的に定義された因果グラフが不足していました。
これらの制限を克服するために、人間や物体間の相互作用に関わる因果関係のきめ細かく統一された定義を導入します。
この定義に基づいて、発見、関連、介入、反事実という因果関係の 4 つのレベルすべてにわたる 14,094 の因果関係の質問で構成される新しいデータセット CELLO を構築します。
このデータセットは、人間と物体の間の相互作用を詳細に示す明示的な因果関係グラフを含むことにより、従来の常識的な因果関係を超えています。
CELLO に関する広範な実験により、現在の LVLM は依然として因果推論タスクに苦戦していることが明らかになりましたが、因果関係にインスピレーションを得た思考連鎖を促す戦略である、私たちが提案する CELLO-CoT から大きな恩恵を受けることができます。
この研究からの定量的分析と定性的分析の両方は、将来の研究に貴重な洞察を提供します。
私たちのプロジェクトページは https://github.com/OpenCausaLab/CELLO にあります。

要約(オリジナル)

Causal reasoning is fundamental to human intelligence and crucial for effective decision-making in real-world environments. Despite recent advancements in large vision-language models (LVLMs), their ability to comprehend causality remains unclear. Previous work typically focuses on commonsense causality between events and/or actions, which is insufficient for applications like embodied agents and lacks the explicitly defined causal graphs required for formal causal reasoning. To overcome these limitations, we introduce a fine-grained and unified definition of causality involving interactions between humans and/or objects. Building on the definition, we construct a novel dataset, CELLO, consisting of 14,094 causal questions across all four levels of causality: discovery, association, intervention, and counterfactual. This dataset surpasses traditional commonsense causality by including explicit causal graphs that detail the interactions between humans and objects. Extensive experiments on CELLO reveal that current LVLMs still struggle with causal reasoning tasks, but they can benefit significantly from our proposed CELLO-CoT, a causally inspired chain-of-thought prompting strategy. Both quantitative and qualitative analyses from this study provide valuable insights for future research. Our project page is at https://github.com/OpenCausaLab/CELLO.

arxiv情報

著者	Meiqi Chen,Bo Peng,Yan Zhang,Chaochao Lu
発行日	2024-06-27 12:34:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CELLO: Causal Evaluation of Large Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー