Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

要約

Explainable AI は、ML モデルの不透明な意思決定戦略を、ユーザーが解釈できる説明に変換します。たとえば、各入力機能の貢献度を目前の予測に特定します。
ただし、そのような説明は、全体的な複雑な意思決定戦略に入る可能性のある複数の要因を絡ませます。
より抽象的な人間が理解できる概念にマッピングでき、概念と入力機能の共同帰属を可能にする活性化空間で関連する部分空間を見つけることにより、説明を解きほぐすことを提案します。
目的の表現を自動的に抽出するために、PCA と部分空間分析の原理を説明に拡張する新しい部分空間分析定式化を提案します。
主関連成分分析 (PRCA) および絡み合っていない関連部分空間分析 (DRSA) と呼ばれるこれらの新しい分析は、より伝統的な分散または尖度ではなく、予測された活性化の関連性を最適化します。
これにより、特に、予測モデルが不変である活性化または概念を無視して、予測と説明に真に関連する部分空間にさらに強く焦点を当てることができます。
私たちのアプローチは、Shapley Value、Integrated Gradients、LRP などの一般的なアトリビューション手法と連携するのに十分一般的です。
ベンチマークと 3 つのユースケースで実証されているように、提案された方法は実際に有用であり、最新技術と比較して優れていることが示されています。

要約(オリジナル)

Explainable AI transforms opaque decision strategies of ML models into explanations that are interpretable by the user, for example, identifying the contribution of each input feature to the prediction at hand. Such explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by finding relevant subspaces in activation space that can be mapped to more abstract human-understandable concepts and enable a joint attribution on concepts and input features. To automatically extract the desired representation, we propose new subspace analysis formulations that extend the principle of PCA and subspace analysis to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), optimize relevance of projected activations rather than the more traditional variance or kurtosis. This enables a much stronger focus on subspaces that are truly relevant for the prediction and the explanation, in particular, ignoring activations or concepts to which the prediction model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.

arxiv情報

著者	Pattarawat Chormai,Jan Herrmann,Klaus-Robert Müller,Grégoire Montavon
発行日	2022-12-30 18:04:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー