Axiomatic Aggregations of Abductive Explanations

要約

事後モデル近似説明法 (LIME や SHAP など) の堅牢性に対する最近の批判により、モデル精度のアブダクティブ説明が台頭しています。
データポイントごとに、アブダクティブな説明により、結果を生成するのに十分な特徴の最小限のサブセットが提供されます。
理論的には健全で厳密ですが、アブダクティブな説明には大きな問題があります。同じデータポイントに対して有効なアブダクティブな説明が複数存在する可能性があります。
このような場合、単純な説明を 1 つ提供するだけでは不十分な場合があります。
その一方で、有効なアブダクティブな説明をすべて提供すると、そのサイズのせいで理解できない場合があります。
この研究では、多くの考えられるアブダクティブな説明を特徴重要度スコアに集約することで、この問題を解決します。
我々は 3 つの集計方法を提案します。2 つは協力ゲーム理論のパワーインデックスに基づいており、3 つ目は因果関係の強さのよく知られた尺度に基づいています。
これら 3 つの方法を公理的に特徴付け、それぞれが一連の望ましい特性を独自に満たすことを示します。
また、複数のデータセットでそれらを評価し、これらの説明がSHAPとLIMEを騙す攻撃に対して堅牢であることを示します。

要約(オリジナル)

The recent criticisms of the robustness of post hoc model approximation explanation methods (like LIME and SHAP) have led to the rise of model-precise abductive explanations. For each data point, abductive explanations provide a minimal subset of features that are sufficient to generate the outcome. While theoretically sound and rigorous, abductive explanations suffer from a major issue — there can be several valid abductive explanations for the same data point. In such cases, providing a single abductive explanation can be insufficient; on the other hand, providing all valid abductive explanations can be incomprehensible due to their size. In this work, we solve this issue by aggregating the many possible abductive explanations into feature importance scores. We propose three aggregation methods: two based on power indices from cooperative game theory and a third based on a well-known measure of causal strength. We characterize these three methods axiomatically, showing that each of them uniquely satisfies a set of desirable properties. We also evaluate them on multiple datasets and show that these explanations are robust to the attacks that fool SHAP and LIME.

arxiv情報

著者	Gagan Biradar,Yacine Izza,Elita Lobo,Vignesh Viswanathan,Yair Zick
発行日	2023-10-12 17:02:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Axiomatic Aggregations of Abductive Explanations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー