Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property

要約

ブラックボックス機械学習モデルの予測を説明する技術は、モデルを透明化し、それによって AI システムへの信頼を高めるために重要です。
モデルへの入力特徴は多くの場合、高レベルの特徴と低レベルの特徴で構成される入れ子構造を持ち、各高レベルの特徴は複数の低レベルの特徴に分解されます。
このような入力の場合、モデルの決定をより深く理解するには、高レベルの特徴属性 (HiFA) と低レベルの特徴属性 (LoFA) の両方が重要です。
この論文では、入力の入れ子構造を効果的に利用して 2 レベルの特徴属性を同時に推定する、モデルに依存しない局所説明手法を提案します。
提案された方法の重要なアイデアは、HiFA と LoFA の間に存在するはずの一貫性特性を導入し、それによってそれらを推定するための個別の最適化問題を橋渡しすることです。
この一貫性特性のおかげで、提案された方法は、モデルへのより少ないクエリ数を使用して、ブラックボックスモデルに忠実であり、相互に一貫性のある HiFA と LoFA を生成できます。
マルチインスタンス学習における画像分類と言語モデルを用いたテキスト分類の実験では、提案手法によって推定された HiFA と LoFA が正確であり、ブラックボックスモデルの動作に忠実であり、一貫した説明を提供することを示します。

要約(オリジナル)

Techniques that explain the predictions of black-box machine learning models are crucial to make the models transparent, thereby increasing trust in AI systems. The input features to the models often have a nested structure that consists of high- and low-level features, and each high-level feature is decomposed into multiple low-level features. For such inputs, both high-level feature attributions (HiFAs) and low-level feature attributions (LoFAs) are important for better understanding the model’s decision. In this paper, we propose a model-agnostic local explanation method that effectively exploits the nested structure of the input to estimate the two-level feature attributions simultaneously. A key idea of the proposed method is to introduce the consistency property that should exist between the HiFAs and LoFAs, thereby bridging the separate optimization problems for estimating them. Thanks to this consistency property, the proposed method can produce HiFAs and LoFAs that are both faithful to the black-box models and consistent with each other, using a smaller number of queries to the models. In experiments on image classification in multiple instance learning and text classification using language models, we demonstrate that the HiFAs and LoFAs estimated by the proposed method are accurate, faithful to the behaviors of the black-box models, and provide consistent explanations.

arxiv情報

著者	Yuya Yoshikawa,Masanari Kimura,Ryotaro Shimizu,Yuki Saito
発行日	2024-05-23 13:03:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー