Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data

要約

表形式データの分類と回帰では、勾配ブースト決定木 (GBDT) の優位性が最近、広範なハイパーパラメータ調整を伴うはるかに低速な深層学習手法によって挑戦されています。
私たちは、(a) 改良された多層パーセプトロン (MLP) である RealMLP、および (b) GBDT と RealMLP の強力なメタチューニングされたデフォルトパラメーターを導入することで、この矛盾に対処します。
118 個のデータセットを含むメタトレインベンチマークで RealMLP とデフォルトパラメーターを調整し、90 個のデータセットを含む互いに素なメタテストベンチマークのハイパーパラメーター最適化バージョンと比較します。また、Grinsztajn らによる GBDT 対応ベンチマークとも比較します。
（2022年）。
中規模から大規模の表形式データセット (1K ～ 500K サンプル) でのベンチマーク結果は、RealMLP が他のニューラルベースラインと比較して有利な時間精度のトレードオフを提供し、ベンチマークスコアの点で GBDT と競合できることを示しています。
さらに、改善されたデフォルトパラメーターを備えた RealMLP と GBDT を組み合わせると、ハイパーパラメーターを調整しなくても優れた結果を達成できます。
最後に、RealMLP の改良点の一部により、デフォルトパラメーターを使用した TabR のパフォーマンスも大幅に向上できることを示します。

要約(オリジナル)

For classification and regression on tabular data, the dominance of gradient-boosted decision trees (GBDTs) has recently been challenged by often much slower deep learning methods with extensive hyperparameter tuning. We address this discrepancy by introducing (a) RealMLP, an improved multilayer perceptron (MLP), and (b) strong meta-tuned default parameters for GBDTs and RealMLP. We tune RealMLP and the default parameters on a meta-train benchmark with 118 datasets and compare them to hyperparameter-optimized versions on a disjoint meta-test benchmark with 90 datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). Our benchmark results on medium-to-large tabular datasets (1K–500K samples) show that RealMLP offers a favorable time-accuracy tradeoff compared to other neural baselines and is competitive with GBDTs in terms of benchmark scores. Moreover, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results without hyperparameter tuning. Finally, we demonstrate that some of RealMLP’s improvements can also considerably improve the performance of TabR with default parameters.

arxiv情報

著者	David Holzmüller,Léo Grinsztajn,Ingo Steinwart
発行日	2025-01-15 16:02:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー