Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data



この広範な分析により、因果関係発見の複雑さに合わせた構造化された分類法の開発が行われ、方法が 6 つの主要なタイプに分類されます。
包括的な評価の欠如に対処するために、私たちの研究では、複数の合成データセットと現実世界のデータセットに対して 29 の因果関係発見アルゴリズムの広範な実証的評価を実施しています。
5 つの評価指標を使用して、サイズ、線形性、ノイズ分布に基づいて合成データセットを分類し、上位 3 つのアルゴリズム推奨事項を要約して、さまざまなデータ シナリオにおけるユーザーにガイドラインを提供します。
さらに、未知のデータセットに対するユーザーのアルゴリズム選択を支援するために、80% を超える精度のメタデータ抽出戦略が開発されています。


Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies, i.e., there is no universal classification standard for existing methods, and a lack of comprehensive evaluations, i.e., data characteristics are often ignored to be jointly analyzed when benchmarking algorithms. This study addresses these gaps by conducting an exhaustive review and empirical evaluation for causal discovery methods on numerical data, aiming to provide a clearer and more structured understanding of the field. Our research begins with a comprehensive literature review spanning over two decades, analyzing over 200 academic articles and identifying more than 40 representative algorithms. This extensive analysis leads to the development of a structured taxonomy tailored to the complexities of causal discovery, categorizing methods into six main types. To address the lack of comprehensive evaluations, our study conducts an extensive empirical assessment of 29 causal discovery algorithms on multiple synthetic and real-world datasets. We categorize synthetic datasets based on size, linearity, and noise distribution, employing five evaluation metrics, and summarize the top-3 algorithm recommendations, providing guidelines for users in various data scenarios. Our results highlight a significant impact of dataset characteristics on algorithm performance. Moreover, a metadata extraction strategy with an accuracy exceeding 80% is developed to assist users in algorithm selection on unknown datasets. Based on these insights, we offer professional and practical guidelines to help users choose the most suitable causal discovery methods for their specific dataset.


著者 Wenjin Niu,Zijun Gao,Liyan Song,Lingbo Li
発行日 2024-09-04 13:13:03+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.AI パーマリンク