Evaluating Input Feature Explanations through a Unified Diagnostic Evaluation Framework

要約

機械学習モデルの意思決定プロセスを説明することは、エンドユーザーの信頼性と透明性を確保するために重要です。
人気のある説明形式の1つは、i）トークン（Shapley値や統合勾配など）、ii）トークン（たとえば、二変量Shapleyおよび注意ベースの方法）、またはIII）入力間の相互作用などの重要な入力機能を強調しています。
（例えば、ルーバンスパンの相互作用）。
ただし、これらの説明タイプは単独でのみ研究されているため、それぞれの適用性を判断することは困難です。
このギャップを埋めるために、4つの診断特性で構成されるハイライトとインタラクティブな説明の自動化された直接的な比較を促進する統一されたフレームワークを開発します。
2つのデータセットと2つのモデルにわたって、3つの異なる説明手法を使用するこれらの3つのタイプの入力機能説明にわたって広範な分析を実施し、各説明が異なる診断プロパティで異なる強みがあることを明らかにします。
それにもかかわらず、インタラクティブなスパンの説明は、ほとんどの診断特性にわたって他のタイプの入力特徴の説明を上回ります。
比較的研究されているにもかかわらず、私たちの分析は、これらの説明タイプを生成する方法を改善するためのさらなる研究の必要性を強調しています。
さらに、特定の特性でパフォーマンスを向上させる他の説明タイプとそれらを統合すると、全体的な有効性がさらに向上する可能性があります。

要約(オリジナル)

Explaining the decision-making process of machine learning models is crucial for ensuring their reliability and transparency for end users. One popular explanation form highlights key input features, such as i) tokens (e.g., Shapley Values and Integrated Gradients), ii) interactions between tokens (e.g., Bivariate Shapley and Attention-based methods), or iii) interactions between spans of the input (e.g., Louvain Span Interactions). However, these explanation types have only been studied in isolation, making it difficult to judge their respective applicability. To bridge this gap, we develop a unified framework that facilitates an automated and direct comparison between highlight and interactive explanations comprised of four diagnostic properties. We conduct an extensive analysis across these three types of input feature explanations — each utilizing three different explanation techniques — across two datasets and two models, and reveal that each explanation has distinct strengths across the different diagnostic properties. Nevertheless, interactive span explanations outperform other types of input feature explanations across most diagnostic properties. Despite being relatively understudied, our analysis underscores the need for further research to improve methods generating these explanation types. Additionally, integrating them with other explanation types that perform better in certain characteristics could further enhance their overall effectiveness.

arxiv情報

著者	Jingyi Sun,Pepa Atanasova,Isabelle Augenstein
発行日	2025-02-07 15:11:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating Input Feature Explanations through a Unified Diagnostic Evaluation Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー