Transformer Neural Processes — Kernel Regression

要約

確率過程は病気の伝播から株価に至るまでのさまざまな自然現象をモデル化しますが、その不確実性をシミュレートして定量化することは計算的に困難な場合があります。
たとえば、標準的な統計手法を使用してガウス過程をモデル化すると $\mathcal{O}(n^3)$ のペナルティが発生し、最先端のニューラルプロセス (NP) を使用した場合でも $\mathcal{O} のペナルティが発生します。
注意メカニズムによる (n^2)$ ペナルティ。
トランスフォーマーニューラルプロセス – カーネル回帰 (TNP-KR) は、カーネル回帰ブロック (KRBlock) と呼ばれる新しいトランスフォーマーブロックを組み込んだ新しいアーキテクチャであり、トランスフォーマーベースのニューラルプロセス (TNP) で注目される計算の複雑さを軽減します。
$\mathcal{O}((n_C+n_T)^2)$ から $O(n_C^2+n_Cn_T)$ まで
マスクされた計算を排除することによって ($n_C$ はコンテキストの数、$n_T$ はテストポイントの数です)、すべてのアテンションの計算を $\mathcal{O}(n_C)$ にさらに削減する高速アテンションのバリアントです。
空間と時間の複雑さの中で。
メタ回帰、ベイジアン最適化、画像補完などのタスクにわたるベンチマークでは、完全なバリアントが最先端の手法のパフォーマンスに匹敵する一方、トレーニングが高速化され、テストポイントの数が 2 桁増加することを実証しました。
高速バリアントは、消費者向けハードウェア上で数百万のテストポイントとコンテキストポイントの両方に拡張しながら、そのパフォーマンスにほぼ匹敵します。

要約(オリジナル)

Stochastic processes model various natural phenomena from disease transmission to stock prices, but simulating and quantifying their uncertainty can be computationally challenging. For example, modeling a Gaussian Process with standard statistical methods incurs an $\mathcal{O}(n^3)$ penalty, and even using state-of-the-art Neural Processes (NPs) incurs an $\mathcal{O}(n^2)$ penalty due to the attention mechanism. We introduce the Transformer Neural Process – Kernel Regression (TNP-KR), a new architecture that incorporates a novel transformer block we call a Kernel Regression Block (KRBlock), which reduces the computational complexity of attention in transformer-based Neural Processes (TNPs) from $\mathcal{O}((n_C+n_T)^2)$ to $O(n_C^2+n_Cn_T)$ by eliminating masked computations, where $n_C$ is the number of context, and $n_T$ is the number of test points, respectively, and a fast attention variant that further reduces all attention calculations to $\mathcal{O}(n_C)$ in space and time complexity. In benchmarks spanning such tasks as meta-regression, Bayesian optimization, and image completion, we demonstrate that the full variant matches the performance of state-of-the-art methods while training faster and scaling two orders of magnitude higher in number of test points, and the fast variant nearly matches that performance while scaling to millions of both test and context points on consumer hardware.

arxiv情報

著者	Daniel Jenson,Jhonathan Navott,Mengyan Zhang,Makkunda Sharma,Elizaveta Semenova,Seth Flaxman
発行日	2024-11-19 13:40:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Transformer Neural Processes — Kernel Regression

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー