Towards Inferential Reproducibility of Machine Learning Research

要約

機械学習評価の信頼性 (複製されたモデルトレーニングの実行全体で観察された評価スコアの一貫性) は、測定ノイズと見なすことができる非決定性のいくつかの原因の影響を受けます。
研究結果の再現性を確保するためにノイズを除去する現在の傾向は、実装レベルで固有の非決定性を無視し、アルゴリズムのノイズ要因とデータプロパティ間の重要な相互作用効果を無視しています。
これは、そのような実験から引き出すことができる結論の範囲を制限します。
ノイズを除去する代わりに、トレーニングされたモデルの特定のインスタンスを超えた推論を引き出すことを目的として、データプロパティとの相互作用を含むいくつかの分散の原因を、機械学習評価の重要性と信頼性の分析に組み込むことを提案します。
線形混合効果モデル (LMEM) を使用してパフォーマンス評価スコアを分析し、一般化尤度比検定 (GLRT) を使用して統計的推論を行う方法を示します。
これにより、メタパラメータの変動などのノイズの任意のソースを統計的有意性テストに組み込み、データプロパティを条件としてパフォーマンスの違いを評価することができます。
さらに、分散成分分析 (VCA) により、全体的な分散に対するノイズ源の寄与の分析と、実質的な分散と合計分散の比率による信頼係数の計算が可能になります。

要約(オリジナル)

Reliability of machine learning evaluation — the consistency of observed evaluation scores across replicated model training runs — is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.

arxiv情報

著者	Michael Hagmann,Philipp Meier,Stefan Riezler
発行日	2023-03-08 11:37:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Inferential Reproducibility of Machine Learning Research

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー