Improving Network Interpretability via Explanation Consistency Evaluation

要約

ディープニューラルネットワークは目覚ましいパフォーマンスを達成していますが、予測の透明性に欠ける傾向があります。
ニューラルネットワークの解釈可能性を追求すると、多くの場合、本来のパフォーマンスが低下します。
一部の作品は解釈可能性とパフォーマンスの両方を向上させるよう努めていますが、それらは主に細心の注意を払って課された条件に依存しています。
この論文では、追加の監視を必要とせずに、より説明可能なアクティベーションヒートマップを取得し、同時にモデルのパフォーマンスを向上させる、シンプルかつ効果的なフレームワークを提案します。
具体的には、私たちの簡潔なフレームワークは、モデル学習でトレーニングサンプルを適応的に再重み付けするために、新しい指標、つまり説明の一貫性を導入します。
説明一貫性メトリクスは、元のサンプルのモデルの視覚的説明と、画像敵対的攻撃手法を使用して背景領域が摂動された、意味が保存された敵対的サンプルのモデルの視覚的説明との間の類似性を測定するために利用されます。
次に、私たちのフレームワークは、現在のモデルが堅牢な解釈を提供できない、説明の差異が大きい（つまり、説明の一貫性が低い）トレーニングサンプルに細心の注意を払うことで、モデルの学習を促進します。
さまざまなベンチマークに関する包括的な実験結果は、通常のネットワークと解釈可能なネットワークの両方での、より高い認識精度、より優れたデータバイアス除去機能、より強力なネットワーク堅牢性、より正確な位置特定機能など、複数の側面で当社のフレームワークの優位性を実証しています。
また、各コンポーネントの詳細な寄与を明らかにするために、広範なアブレーション研究と定性分析も提供します。

要約(オリジナル)

While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet effective framework that acquires more explainable activation heatmaps and simultaneously increase the model performance, without the need for any extra supervision. Specifically, our concise framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. The explanation consistency metric is utilized to measure the similarity between the model’s visual explanations of the original samples and those of semantic-preserved adversarial samples, whose background regions are perturbed by using image adversarial attack techniques. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations (i.e., low explanation consistency), for which the current model cannot provide robust interpretations. Comprehensive experimental results on various benchmarks demonstrate the superiority of our framework in multiple aspects, including higher recognition accuracy, greater data debiasing capability, stronger network robustness, and more precise localization ability on both regular networks and interpretable networks. We also provide extensive ablation studies and qualitative analyses to unveil the detailed contribution of each component.

arxiv情報

著者	Hefeng Wu,Hao Jiang,Keze Wang,Ziyi Tang,Xianghuan He,Liang Lin
発行日	2024-08-08 17:20:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Network Interpretability via Explanation Consistency Evaluation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー