Are machine learning interpretations reliable? A stability study on global interpretations

要約

機械学習システムは、ハイステークスドメインでますます使用されているため、これらのシステムへの信頼を改善するためにそれらを解釈できるようにすることに重点が置かれています。
これに応じて、さまざまな解釈可能な機械学習（IML）方法が開発されており、それ以外の場合はブラックボックスモデルに関する人間に理解できる洞察を生み出しています。
これらの方法では、基本的な疑問が生じます。これらの解釈は信頼できますか？
監視されたモデルの予測精度やその他の評価メトリックとは異なり、真の解釈への近接性を定義することは困難です。
代わりに、私たちは信頼性の前提条件であると主張する密接に関連する質問をします。これらの解釈は安定していますか？
安定性を、データまたはアルゴリズムへの小さなランダムな摂動の下で一貫性または信頼できる結果として定義します。
この研究では、表形式データに関する監視されたおよび監督されていないタスクの両方について、一般的な機械学習のグローバル解釈に関する最初の体系的で大規模な経験的安定性研究を実施します。
私たちの調査結果は、一般的な解釈方法はしばしば不安定であり、特に予測自体よりも安定性が低く、機械学習予測の精度と関連する解釈の安定性との間に関連性がないことを明らかにしています。
さらに、単一の方法では、さまざまなベンチマークデータセットにわたって最も安定した解釈を一貫して提供しないことを示します。
全体として、これらの結果は、解釈可能性だけが信頼を保証しないことを示唆しており、将来の仕事における解釈の安定性の厳密な評価の必要性を強調しています。
これらの原則をサポートするために、研究者が独自のデータ駆動型解釈と発見の安定性と信頼性を評価できるようにするために、オープンソースIMLダッシュボードとPythonパッケージを開発およびリリースしました。

要約(オリジナル)

As machine learning systems are increasingly used in high-stakes domains, there is a growing emphasis placed on making them interpretable to improve trust in these systems. In response, a range of interpretable machine learning (IML) methods have been developed to generate human-understandable insights into otherwise black box models. With these methods, a fundamental question arises: Are these interpretations reliable? Unlike with prediction accuracy or other evaluation metrics for supervised models, the proximity to the true interpretation is difficult to define. Instead, we ask a closely related question that we argue is a prerequisite for reliability: Are these interpretations stable? We define stability as findings that are consistent or reliable under small random perturbations to the data or algorithms. In this study, we conduct the first systematic, large-scale empirical stability study on popular machine learning global interpretations for both supervised and unsupervised tasks on tabular data. Our findings reveal that popular interpretation methods are frequently unstable, notably less stable than the predictions themselves, and that there is no association between the accuracy of machine learning predictions and the stability of their associated interpretations. Moreover, we show that no single method consistently provides the most stable interpretations across a range of benchmark datasets. Overall, these results suggest that interpretability alone does not warrant trust, and underscores the need for rigorous evaluation of interpretation stability in future work. To support these principles, we have developed and released an open source IML dashboard and Python package to enable researchers to assess the stability and reliability of their own data-driven interpretations and discoveries.

arxiv情報

著者	Luqin Gan,Tarek M. Zikry,Genevera I. Allen
発行日	2025-05-21 16:34:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Are machine learning interpretations reliable? A stability study on global interpretations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー