Evaluation of Interpretability Methods and Perturbation Artifacts in Deep Neural Networks

要約

画像の分類、検出、予測においてディープニューラルネットワーク（DNN）が優れた性能を発揮しているにもかかわらず、DNNがどのように所定の判断を下すかを特徴付けることは未解決の問題であり、その結果、多くの解釈可能性手法が存在する。ポストホック・インタプリタビリティ手法は、主にクラス確率に関する入力特徴の重要性を定量化することを目的としています。しかし、グランドトゥルースの欠如や、多様な動作特性を持つ解釈可能性手法が存在するため、これらの手法を評価することは重要な課題である。解釈可能性手法の評価方法として一般的なのは、ある予測に対して重要と考えられる入力特徴に摂動を与え、精度の低下を観察する方法です。しかし、摂動された画像は分布外（OOD）である可能性があるため、摂動自体がアーチファクトをもたらす可能性がある。本論文では、摂動アーチファクトの寄与を推定する計算実験を行い、解釈可能性手法の忠実度を推定する方法を開発した。我々は、摂動アーチファクトは確かに存在するが、Most Import First (MIF) とLeast Import First (LIF) の順序に従って入力特徴量を摂動することによるモデル精度曲線を利用することにより、忠実度推定への影響を最小化し特徴付けることができることを実証する。ImageNet上で学習させたResNet-50を用い、4つの一般的なポストホックインタープリタビリティ手法の忠実度推定を提案するデモを行う。

要約(オリジナル)

Despite excellent performance of deep neural networks (DNNs) in image classification, detection, and prediction, characterizing how DNNs make a given decision remains an open problem, resulting in a number of interpretability methods. Post-hoc interpretability methods primarily aim to quantify the importance of input features with respect to the class probabilities. However, due to the lack of ground truth and the existence of interpretability methods with diverse operating characteristics, evaluating these methods is a crucial challenge. A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy. However, perturbation itself may introduce artifacts, since perturbed images may be out-of-distribution (OOD). In this paper, we have conducted computational experiments to estimate the contribution of perturbation artifacts and developed a method to estimate the fidelity of interpretability methods. We demonstrate that, while perturbation artifacts indeed exist, we can minimize and characterize their impact on fidelity estimation by utilizing model accuracy curves from perturbing input features according to the Most Import First (MIF) and Least Import First (LIF) orders. Using the ResNet-50 trained on the ImageNet, we demonstrate the proposed fidelity estimation of four popular post-hoc interpretability methods.

arxiv情報

著者	Lennart Brocki,Neo Christopher Chung
発行日	2023-03-06 15:26:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Evaluation of Interpretability Methods and Perturbation Artifacts in Deep Neural Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー