VisTabNet: Adapting Vision Transformers for Tabular Data

要約

ディープラーニングモデルは自然言語の処理とコンピュータービジョンで大きな成功を収めていますが、生物学的、産業、金融アプリケーションで使用される最も一般的なデータ型である表形式データの場合、同等の改善点は観察されません。
特に、大規模な事前訓練を受けたモデルを小さな表形式データセットで定義された下流のタスクに転送することは困難です。
これに対処するために、VistabNetを提案します。これは、モーダル転送学習方法であり、事前に訓練されたウェイトを備えた視覚変圧器（VIT）を形成データを処理することができます。
VITが許容できるパッチ埋め込みに表形式の入力を投影することにより、事前に訓練されたトランスエンコーダーを表形式の入力に直接適用できます。
このアプローチは、モデルをゼロからトレーニングするための計算コストを削減しながら、表形式データを処理するための適切なアーキテクチャを設計する概念コストを排除します。
複数の小さな表形式データセット（1k未満のサンプル）での実験結果は、Vistabnetの優位性を示し、従来のアンサンブル方法と最近の深い学習モデルの両方を上回ります。
提案された方法は、従来の転送学習の実践を超えており、事前に訓練された画像モデルを転送して表形式の問題を解決し、転送学習の境界を拡張できることを示しています。
https://github.com/wwydmanski/vistabnetで入手可能なgithubリポジトリとして実装の例を共有します。

要約(オリジナル)

Although deep learning models have had great success in natural language processing and computer vision, we do not observe comparable improvements in the case of tabular data, which is still the most common data type used in biological, industrial and financial applications. In particular, it is challenging to transfer large-scale pre-trained models to downstream tasks defined on small tabular datasets. To address this, we propose VisTabNet — a cross-modal transfer learning method, which allows for adapting Vision Transformer (ViT) with pre-trained weights to process tabular data. By projecting tabular inputs to patch embeddings acceptable by ViT, we can directly apply a pre-trained Transformer Encoder to tabular inputs. This approach eliminates the conceptual cost of designing a suitable architecture for processing tabular data, while reducing the computational cost of training the model from scratch. Experimental results on multiple small tabular datasets (less than 1k samples) demonstrate VisTabNet’s superiority, outperforming both traditional ensemble methods and recent deep learning models. The proposed method goes beyond conventional transfer learning practice and shows that pre-trained image models can be transferred to solve tabular problems, extending the boundaries of transfer learning. We share our example implementation as a GitHub repository available at https://github.com/wwydmanski/VisTabNet.

arxiv情報

著者	Witold Wydmański,Ulvi Movsum-zada,Jacek Tabor,Marek Śmieja
発行日	2025-04-25 12:19:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VisTabNet: Adapting Vision Transformers for Tabular Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー