Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks

要約

この論文では、視覚的分類タスクにおけるモデルの解釈可能性とパフォーマンスを向上させるための新しい概念の学習フレームワークを紹介します。
私たちのアプローチでは、教師なし説明ジェネレーターを主分類子ネットワークに追加し、敵対的トレーニングを利用します。
トレーニング中、説明モジュールは分類子の潜在表現から視覚的概念を抽出するように最適化され、GAN ベースのモジュールは概念から生成された画像と真の画像を区別することを目的としています。
この共同トレーニングスキームにより、モデルは内部で学習した概念を人間が解釈可能な視覚的プロパティと暗黙的に整合させることができます。
包括的な実験は、一貫した概念の活性化を生み出しながら、私たちのアプローチの堅牢性を実証します。
学習した概念を分析し、オブジェクトの部分や視覚的属性との意味的な一致を示します。
また、敵対的トレーニングプロトコルの摂動が分類と概念獲得の両方にどのような影響を与えるかについても研究します。
要約すると、この研究は、タスクに合わせた概念表現を備えた本質的に解釈可能なディープビジョンモデルの構築に向けた重要な一歩を示しています。これは、現実世界の認識タスク向けに信頼できる AI を開発するための重要な実現要因です。

要約(オリジナル)

This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks. Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training. During training, the explanation module is optimized to extract visual concepts from the classifier’s latent representations, while the GAN-based module aims to discriminate images generated from concepts, from true images. This joint training scheme enables the model to implicitly align its internally learned concepts with human-interpretable visual properties. Comprehensive experiments demonstrate the robustness of our approach, while producing coherent concept activations. We analyse the learned concepts, showing their semantic concordance with object parts and visual attributes. We also study how perturbations in the adversarial training protocol impact both classification and concept acquisition. In summary, this work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations – a key enabler for developing trustworthy AI for real-world perception tasks.

arxiv情報

著者	Tanmay Garg,Deepika Vemuri,Vineeth N Balasubramanian
発行日	2024-01-09 16:16:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー