Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients

要約

ディープラーニングモデルは高い予測パフォーマンスを達成しますが、本質的な解釈性を欠いており、学習した予測行動の理解を妨げます。
既存のローカル説明方法は、モデル予測の因果的要因を無視して、関連性に焦点を当てています。
他のアプローチは因果的な視点を採用していますが、主により一般的なグローバルな説明を提供します。
ただし、特定の入力については、グローバルに特定された要因が局所的に適用されるかどうかは不明です。
この制限に対処するために、画像間編集モデルの最近の進歩を活用することにより、局所介入の説明のための新しいフレームワークを紹介します。
私たちのアプローチは、セマンティックプロパティに関する段階的な介入を実行して、新しいスコアである予想されるプロパティグラデーションの大きさを使用して、モデルの予測に対する対応する影響を定量化します。
幅広いアーキテクチャとタスクに対する広範な経験的評価を通じて、アプローチの有効性を実証します。
まず、合成シナリオで検証し、バイアスを局所的に識別する能力を実証します。
その後、ネットワークトレーニングのダイナミクスを分析し、医療皮膚病変分類器を調査し、実際の介入データを使用した事前に訓練されたクリップモデルを研究するためのアプローチを適用します。
私たちの結果は、深いモデルの動作に関する新しい洞察を明らかにするために、プロパティレベルでの介入的説明の可能性を強調しています。

要約(オリジナル)

Deep learning models achieve high predictive performance but lack intrinsic interpretability, hindering our understanding of the learned prediction behavior. Existing local explainability methods focus on associations, neglecting the causal drivers of model predictions. Other approaches adopt a causal perspective but primarily provide more general global explanations. However, for specific inputs, it’s unclear whether globally identified factors apply locally. To address this limitation, we introduce a novel framework for local interventional explanations by leveraging recent advances in image-to-image editing models. Our approach performs gradual interventions on semantic properties to quantify the corresponding impact on a model’s predictions using a novel score, the expected property gradient magnitude. We demonstrate the effectiveness of our approach through an extensive empirical evaluation on a wide range of architectures and tasks. First, we validate it in a synthetic scenario and demonstrate its ability to locally identify biases. Afterward, we apply our approach to analyze network training dynamics, investigate medical skin lesion classifiers, and study a pre-trained CLIP model with real-life interventional data. Our results highlight the potential of interventional explanations on the property level to reveal new insights into the behavior of deep models.

arxiv情報

著者	Niklas Penzel,Joachim Denzler
発行日	2025-03-07 13:50:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー