MotherNet: Fast Training and Inference via Hyper-Network Transformers

要約

基礎モデルは、多くのモダリティにわたって機械学習を変換しており、コンテキスト内学習は古典的なモデルトレーニングに置き換えられています。
表形式データに関する最近の研究は、数値データの分類のための基礎モデルを構築する同様の機会を示唆しています。
ただし、既存のメタラーニングアプローチは、推論時間に関してツリーベースの方法と競合することはできません。
この論文では、合成分類タスクで訓練されたハイパーネットワークアーキテクチャであるMotherNetを提案します。これは、かつて見かけないトレーニングセットで促され、単一のフォワードパスを使用したコンテキスト学習によってトレーニングされた「子」ニューラルネットワークの重みを生成します。
通常、比較的制約されたマルチタスク設定のためにトレーニングされているほとんどの既存のハイパーネットワークとは対照的に、マザーネットは、データセット固有の勾配降下なしに任意の表面データセットにマルチクラス分類のモデルを作成できます。
マザーネットによって生成された子ネットワークは、小さなデータセットで勾配降下を使用してトレーニングされたニューラルネットワークを上回り、TABPFNによる予測と勾配ブーストなどの標準MLメソッドに匹敵します。
TABPFNの直接アプリケーションとは異なり、マザーネット生成ネットワークは、推論時間時に非常に効率的です。
また、Hyperfastは小さなデータセットで効果的なコンテキスト内学習を実行できず、データセット固有の微調整とハイパーパラメーターのチューニングに大きく依存していることを実証しますが、マザーネットは微調整またはデタセットごとのハイパーパラメーターを必要としません。

要約(オリジナル)

Foundation models are transforming machine learning across many modalities, with in-context learning replacing classical model training. Recent work on tabular data hints at a similar opportunity to build foundation models for classification for numerical data. However, existing meta-learning approaches can not compete with tree-based methods in terms of inference time. In this paper, we propose MotherNet, a hypernetwork architecture trained on synthetic classification tasks that, once prompted with a never-seen-before training set generates the weights of a trained “child” neural-network by in-context learning using a single forward pass. In contrast to most existing hypernetworks that are usually trained for relatively constrained multi-task settings, MotherNet can create models for multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent. The child network generated by MotherNet outperforms neural networks trained using gradient descent on small datasets, and is comparable to predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of TabPFN, MotherNet generated networks are highly efficient at inference time. We also demonstrate that HyperFast is unable to perform effective in-context learning on small datasets, and heavily relies on dataset specific fine-tuning and hyper-parameter tuning, while MotherNet requires no fine-tuning or per-dataset hyper-parameters.

arxiv情報

著者	Andreas Müller,Carlo Curino,Raghu Ramakrishnan
発行日	2025-05-09 16:02:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MotherNet: Fast Training and Inference via Hyper-Network Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー