Learning by Self-Explaining

要約

説明可能な AI 研究の多くは、説明をモデル検査の手段として扱います。
しかし、これは、エージェントの学習プロセスにおける自己説明の利点を説明する人間の心理学からの発見を無視しています。
これを動機として、画像分類のコンテキストにおいて、Learning by Self-Explaining (LSX) と呼ばれる新しいワークフローを導入します。
LSX は、自己洗練 AI と人間主導の説明型機械学習の側面を利用します。
基礎となる考え方は、学習者モデルは、元の予測タスクの最適化に加えて、内部の批評家モデルからの説明的なフィードバックに基づいてさらに最適化されるというものです。
直観的には、学習者の説明は、内部の批評家がこれらの説明を前提として同じタスクを実行できる場合に「有用」であると考えられます。
LSX の重要なコンポーネントの概要を提供し、これに基づいて、3 つの異なるインスタンス化例を介して広範な実験評価を実行します。
私たちの結果は、モデルの一般化、交絡因子の影響の軽減、よりタスクに関連した忠実なモデルの説明の提供という点で、自己説明学習によるいくつかのレベルでの改善を示しています。
全体的に見て、私たちの研究は、AI モデルの学習段階における自己説明の可能性の証拠を提供します。

要約(オリジナル)

Much of explainable AI research treats explanations as a means for model inspection. Yet, this neglects findings from human psychology that describe the benefit of self-explanations in an agent’s learning process. Motivated by this, we introduce a novel workflow in the context of image classification, termed Learning by Self-Explaining (LSX). LSX utilizes aspects of self-refining AI and human-guided explanatory machine learning. The underlying idea is that a learner model, in addition to optimizing for the original predictive task, is further optimized based on explanatory feedback from an internal critic model. Intuitively, a learner’s explanations are considered ‘useful’ if the internal critic can perform the same task given these explanations. We provide an overview of important components of LSX and, based on this, perform extensive experimental evaluations via three different example instantiations. Our results indicate improvements via Learning by Self-Explaining on several levels: in terms of model generalization, reducing the influence of confounding factors, and providing more task-relevant and faithful model explanations. Overall, our work provides evidence for the potential of self-explaining within the learning phase of an AI model.

arxiv情報

著者	Wolfgang Stammer,Felix Friedrich,David Steinmann,Manuel Brack,Hikaru Shindo,Kristian Kersting
発行日	2024-09-17 16:24:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning by Self-Explaining

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー