Fidelity of Interpretability Methods and Perturbation Artifacts in Neural Networks

要約

画像の分類、検出、予測におけるディープニューラルネットワーク (DNN) は優れたパフォーマンスを発揮しますが、DNN が特定の決定を下す方法を特徴付ける方法は依然として未解決の問題であり、その結果、多数の解釈方法が存在します。
事後解釈可能性メソッドは、主に、クラス確率に関する入力特徴の重要性を定量化することを目的としています。
ただし、グラウンドトゥルースが不足していることと、さまざまな動作特性を持つ解釈可能手法が存在するため、これらの手法を評価することは重大な課題です。
解釈可能性の手法を評価する一般的なアプローチは、特定の予測にとって重要であると考えられる入力特徴を摂動させ、精度の低下を観察することです。
ただし、摂動自体によってアーチファクトが発生する可能性があります。
我々は、Most Import First (MIF) および Least Import First (LIF) の順序に従って摂動入力特徴からのモデル精度曲線を利用することにより、忠実度推定に対するそのようなアーティファクトの影響を推定する方法を提案します。
ImageNet でトレーニングされた ResNet-50 を使用して、4 つの一般的な事後解釈可能性手法の提案された忠実度推定を実証します。

要約(オリジナル)

Despite excellent performance of deep neural networks (DNNs) in image classification, detection, and prediction, characterizing how DNNs make a given decision remains an open problem, resulting in a number of interpretability methods. Post-hoc interpretability methods primarily aim to quantify the importance of input features with respect to the class probabilities. However, due to the lack of ground truth and the existence of interpretability methods with diverse operating characteristics, evaluating these methods is a crucial challenge. A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy. However, perturbation itself may introduce artifacts. We propose a method for estimating the impact of such artifacts on the fidelity estimation by utilizing model accuracy curves from perturbing input features according to the Most Import First (MIF) and Least Import First (LIF) orders. Using the ResNet-50 trained on the ImageNet, we demonstrate the proposed fidelity estimation of four popular post-hoc interpretability methods.

arxiv情報

著者	Lennart Brocki,Neo Christopher Chung
発行日	2023-09-12 15:00:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fidelity of Interpretability Methods and Perturbation Artifacts in Neural Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー