Understanding Model Complexity for temporal tabular and multi-variate time series, case study with Numerai data science tournament

要約

タイトル: Numeraiデータサイエンストーナメントを用いた時間的表と多変量の時系列のモデル複雑性の理解

要約:
– 多変量時系列モデリングにおけるさまざまな特徴エンジニアリングや次元削減方法の使用を探究。
– Numeraiトーナメントから作成された機能対ターゲットクロス相関時系列データセットを使用して、過パラメータ化された状態では、さまざまな特徴エンジニアリング方法のパフォーマンスと予測が同じ平衡状態に収束することを示します。この平衡状態は、再生カーネルヒルベルト空間によって特徴付けられることができます。
– 高次元時系列モデリングのための異なるランダム非線形変換を組み合わせたリッジ回帰を使用する新しいアンサンブル方法を提案する。シーケンスモデリングに使用されるLSTMやトランスフォーマーなどの一般的に使用される深層学習モデルと比較して、当社の方法はより堅牢であり（異なるランダムシードに対してモデル分散が低く、アーキテクチャの選択に対してより敏感ではありません）、より効率的です。
– 当社の手法の追加の利点は、PyTorchなどの高度なディープラーニングフレームワークを使用する必要がないため、モデルの単純さです。その後、当社が学んだ特徴ランキングをNumeraiトーナメントの時間的表予測問題に適用し、当社の手法から得られた特徴ランキングの予測力は、移動平均に基づくベースライン予測モデルよりも優れています。

要点:
– Numeraiのトーナメントから得られた時系列データセットを使用して、多変量時系列モデルを構築しました。
– 新しいアンサンブル方法を提唱し、深層学習モデルよりも堅牢であり、効率的であることを示しました。
– PyTorchなどの高度なフレームワークを必要とせず、モデルの単純性があることを示しました。
– 新しい手法の特徴ランキングは、移動平均に基づくベースライン予測モデルよりも予測力が高いことを示しました。

要約(オリジナル)

In this paper, we explore the use of different feature engineering and dimensionality reduction methods in multi-variate time-series modelling. Using a feature-target cross correlation time series dataset created from Numerai tournament, we demonstrate under over-parameterised regime, both the performance and predictions from different feature engineering methods converge to the same equilibrium, which can be characterised by the reproducing kernel Hilbert space. We suggest a new Ensemble method, which combines different random non-linear transforms followed by ridge regression for modelling high dimensional time-series. Compared to some commonly used deep learning models for sequence modelling, such as LSTM and transformers, our method is more robust (lower model variance over different random seeds and less sensitive to the choice of architecture) and more efficient. An additional advantage of our method is model simplicity as there is no need to use sophisticated deep learning frameworks such as PyTorch. The learned feature rankings are then applied to the temporal tabular prediction problem in the Numerai tournament, and the predictive power of feature rankings obtained from our method is better than the baseline prediction model based on moving averages

arxiv情報

著者	Thomas Wong,Mauricio Barahona
発行日	2023-04-19 14:39:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Understanding Model Complexity for temporal tabular and multi-variate time series, case study with Numerai data science tournament

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー