LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

要約

自動機能エンジニアリングは、表形式学習タスクの予測モデルパフォーマンスを改善する上で重要な役割を果たします。
従来の自動機能エンジニアリング方法は、固定された手動で設計された検索スペース内の事前定義された変換に依存することにより制限され、ドメインの知識を無視することがよくあります。
大規模な言語モデル（LLM）を使用した最近の進歩により、ドメインの知識を機能エンジニアリングプロセスに統合することができました。
ただし、既存のLLMベースのアプローチは、機能選択のために直接プロンプトを使用したり、機能選択のために検証スコアのみに依存したり、機能の生成とデータ駆動型のパフォーマンスの間の有意義な推論を確立したり、機能したり、重要な推論を確立したりしません。
これらの課題に対処するために、LLM-FEを提案します。LLM-FEは、LLMのドメインの知識と推論能力を組み合わせた新しいフレームワークであるLLM-FEを提案し、表の学習タスクの効果的な機能を自動的に発見します。
LLM-FEは、フィーチャエンジニアリングをプログラム検索の問題として定式化します。ここで、LLMSは新機能変換プログラムを繰り返し提案し、データ駆動型のフィードバックは検索プロセスをガイドします。
我々の結果は、LLM-FEが一貫して最先端のベースラインを上回り、多様な分類および回帰ベンチマーク全体で表形式予測モデルのパフォーマンスを大幅に向上させることを示しています。

要約(オリジナル)

Automated feature engineering plays a critical role in improving predictive model performance for tabular learning tasks. Traditional automated feature engineering methods are limited by their reliance on pre-defined transformations within fixed, manually designed search spaces, often neglecting domain knowledge. Recent advances using Large Language Models (LLMs) have enabled the integration of domain knowledge into the feature engineering process. However, existing LLM-based approaches use direct prompting or rely solely on validation scores for feature selection, failing to leverage insights from prior feature discovery experiments or establish meaningful reasoning between feature generation and data-driven performance. To address these challenges, we propose LLM-FE, a novel framework that combines evolutionary search with the domain knowledge and reasoning capabilities of LLMs to automatically discover effective features for tabular learning tasks. LLM-FE formulates feature engineering as a program search problem, where LLMs propose new feature transformation programs iteratively, and data-driven feedback guides the search process. Our results demonstrate that LLM-FE consistently outperforms state-of-the-art baselines, significantly enhancing the performance of tabular prediction models across diverse classification and regression benchmarks.

arxiv情報

著者	Nikhil Abhyankar,Parshin Shojaee,Chandan K. Reddy
発行日	2025-03-18 17:11:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー