Gradient-free Post-hoc Explainability Using Distillation Aided Learnable Approach

要約

人工知能 (AI) の最近の進歩は、クエリアクセスのみを備えたいくつかの大規模なモデルのリリースにより、事後勾配のない方法でのディープモデルの説明可能性を強く主張します。
この論文では、モデルに依存しない勾配のないアプリケーションで顕著性に基づく説明を生成することを試みる、蒸留支援説明可能性 (DAX) と呼ばれるフレームワークを提案します。
DAX アプローチは、マスク生成ネットワークと蒸留ネットワークを使用した学習可能な設定での説明の問題を提起します。
マスク生成ネットワークは、入力の顕著な領域を見つける乗数マスクを生成する方法を学習します。一方、スチューデント蒸留ネットワークは、ブラックボックスモデルの局所的な動作を近似することを目的としています。
局所的に摂動された入力サンプルを使用し、ブラックボックスモデルへの入出力アクセスから導出されるターゲットを使用して、DAX フレームワーク内の 2 つのネットワークの共同最適化を提案します。
私たちは、分類設定で、さまざまな評価セット (グラウンドトゥルースとの結合と交差、削除ベースおよび主観的な人間の評価ベースの尺度) を使用して、さまざまなモダリティ (画像と音声) にわたって DAX を広範囲に評価し、9 ドルの差異に関してベンチマークを作成します。
メソッド。
これらの評価では、DAX はすべてのモダリティと評価指標において既存のアプローチを大幅に上回っています。

要約(オリジナル)

The recent advancements in artificial intelligence (AI), with the release of several large models having only query access, make a strong case for explainability of deep models in a post-hoc gradient free manner. In this paper, we propose a framework, named distillation aided explainability (DAX), that attempts to generate a saliency-based explanation in a model agnostic gradient free application. The DAX approach poses the problem of explanation in a learnable setting with a mask generation network and a distillation network. The mask generation network learns to generate the multiplier mask that finds the salient regions of the input, while the student distillation network aims to approximate the local behavior of the black-box model. We propose a joint optimization of the two networks in the DAX framework using the locally perturbed input samples, with the targets derived from input-output access to the black-box model. We extensively evaluate DAX across different modalities (image and audio), in a classification setting, using a diverse set of evaluations (intersection over union with ground truth, deletion based and subjective human evaluation based measures) and benchmark it with respect to $9$ different methods. In these evaluations, the DAX significantly outperforms the existing approaches on all modalities and evaluation metrics.

arxiv情報

著者	Debarpan Bhattacharya,Amir H. Poorjam,Deepak Mittal,Sriram Ganapathy
発行日	2024-09-17 12:21:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Gradient-free Post-hoc Explainability Using Distillation Aided Learnable Approach

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー