Scalable Rule Lists Learning with Sampling

要約

社会的に重要な意思決定における機械学習の重要性が高まっていることから、解釈可能なモデルの学習は機械学習研究の主要な焦点となっています。
解釈可能なモデルの中で、ルールリストは最もよく知られており、容易に解釈できるモデルの 1 つです。
ただし、最適なルールリストを見つけることは計算的に困難であり、現在のアプローチは大規模なデータセットには非現実的です。
私たちは、大規模なデータセットからほぼ最適なルールリストを学習するための、新しくてスケーラブルなアプローチを提案します。
私たちのアルゴリズムはサンプリングを使用して、近似の品質を厳密に保証しながら、最適なルールリストの近似を効率的に取得します。
特に、精度の高いルールリストが存在する場合、アルゴリズムは最適なルールリストに非常に近い精度のルールリストを見つけることを保証します。
私たちのアルゴリズムはルールリストの VC 次元に基づいて構築されており、新しい上限と下限を証明します。
大規模なデータセットに対する実験的な評価では、私たちのアルゴリズムが、最先端の正確なアプローチよりも最大 2 桁高速化され、ほぼ最適なルールリストを特定することが示されています。
さらに、私たちのアルゴリズムは、最近のヒューリスティックなアプローチと同じくらい、場合によってはそれよりも高速でありながら、より高品質のルールリストをレポートします。
さらに、私たちのアルゴリズムによって報告されるルールは、ヒューリスティックなアプローチからのルールよりも、最適なルールリストのルールにより似ています。

要約(オリジナル)

Learning interpretable models has become a major focus of machine learning research, given the increasing prominence of machine learning in socially important decision-making. Among interpretable models, rule lists are among the best-known and easily interpretable ones. However, finding optimal rule lists is computationally challenging, and current approaches are impractical for large datasets. We present a novel and scalable approach to learn nearly optimal rule lists from large datasets. Our algorithm uses sampling to efficiently obtain an approximation of the optimal rule list with rigorous guarantees on the quality of the approximation. In particular, our algorithm guarantees to find a rule list with accuracy very close to the optimal rule list when a rule list with high accuracy exists. Our algorithm builds on the VC-dimension of rule lists, for which we prove novel upper and lower bounds. Our experimental evaluation on large datasets shows that our algorithm identifies nearly optimal rule lists with a speed-up up to two orders of magnitude over state-of-the-art exact approaches. Moreover, our algorithm is as fast as, and sometimes faster than, recent heuristic approaches, while reporting higher quality rule lists. In addition, the rules reported by our algorithm are more similar to the rules in the optimal rule list than the rules from heuristic approaches.

arxiv情報

著者	Leonardo Pellegrina,Fabio Vandin
発行日	2024-06-18 17:15:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scalable Rule Lists Learning with Sampling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー