Trust Regions for Explanations via Black-Box Probabilistic Certification

要約

機械学習モデルのブラックボックスの性質を考慮して、個々の意思決定の背後にある要因を解読するために、多数の説明可能性手法が開発されてきました。
本稿では、ブラックボックス（確率的）説明証明という新しい問題を紹介する。
私たちは質問します: クエリアクセス、例の説明、品質指標 (つまり忠実度、安定性) のみを持つブラックボックスモデルが与えられた場合、最大のハイパーキューブ (つまり、$\ell_{\infty}$ ball) を見つけることができるでしょうか。
) 説明がハイパーキューブ内のすべての例に適用されると、(高い確率で) 品質基準が満たされる (つまり、忠実度が特定の値よりも高い) ような例を中心にしていますか?
このような \emph{信頼領域} を効率的に見つけることができることには、複数の利点があります。i) \emph{保証} とともに、\emph{領域} 内のモデルの動作を洞察する。
ii) 説明の \emph{安定性} を確認した。
iii) \emph{説明の再利用}。すべての例の説明を見つける必要がないため、時間、エネルギー、コストを節約できます。
iv) 説明方法を比較するための \emph{meta-metric} の可能性。
私たちの貢献には、この問題の形式化、解決策の提案、計算可能なこれらの解決策の理論的保証の提供、および合成データと実際のデータに対するその有効性を実験的に示すことが含まれます。

要約(オリジナル)

Given the black box nature of machine learning models, a plethora of explainability methods have been developed to decipher the factors behind individual decisions. In this paper, we introduce a novel problem of black box (probabilistic) explanation certification. We ask the question: Given a black box model with only query access, an explanation for an example and a quality metric (viz. fidelity, stability), can we find the largest hypercube (i.e., $\ell_{\infty}$ ball) centered at the example such that when the explanation is applied to all examples within the hypercube, (with high probability) a quality criterion is met (viz. fidelity greater than some value)? Being able to efficiently find such a \emph{trust region} has multiple benefits: i) insight into model behavior in a \emph{region}, with a \emph{guarantee}; ii) ascertained \emph{stability} of the explanation; iii) \emph{explanation reuse}, which can save time, energy and money by not having to find explanations for every example; and iv) a possible \emph{meta-metric} to compare explanation methods. Our contributions include formalizing this problem, proposing solutions, providing theoretical guarantees for these solutions that are computable, and experimentally showing their efficacy on synthetic and real data.

arxiv情報

著者	Amit Dhurandhar,Swagatam Haldar,Dennis Wei,Karthikeyan Natesan Ramamurthy
発行日	2024-06-05 16:36:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Trust Regions for Explanations via Black-Box Probabilistic Certification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー