State-space models can learn in-context by gradient descent

要約

深い状態空間モデル（ディープSSM）は、モデルシーケンスデータへの効果的なアプローチとして人気が高まっています。
また、トランスのように、コンテキスト内学習が可能であることが示されています。
ただし、SSMがどのようにコンテキスト内学習を行うことができるかについての完全な写真は欠落しています。
この研究では、状態空間モデルが勾配ベースの学習を実行し、トランスと同じ方法でコンテキスト内学習に使用できることを示す直接的で明示的な構造を提供します。
具体的には、乗法入力と出力ゲーティングで増強された単一の構造化状態空間モデル層が、勾配降下の1段階後に最小二乗損失で暗黙の線形モデルの出力を再現できることを証明します。
次に、マルチステップ線形および非線形回帰タスクへの簡単な拡張を示します。
線形および非線形回帰タスクでランダムに初期化された拡張SSMをトレーニングすることにより、構造を検証します。
最適化を介した経験的に得られたパラメーターは、理論構造によって分析的に予測されるものと一致します。
全体として、基礎モデルに典型的な表現力を可能にするための重要な帰納的バイアスとして、再発アーキテクチャにおける入力および出力ゲーティングの役割を解明します。
また、状態空間モデルと線形の自己触たちの関係と、コンテキスト内を学習する能力に関する新しい洞察を提供します。

要約(オリジナル)

Deep state-space models (Deep SSMs) are becoming popular as effective approaches to model sequence data. They have also been shown to be capable of in-context learning, much like transformers. However, a complete picture of how SSMs might be able to do in-context learning has been missing. In this study, we provide a direct and explicit construction to show that state-space models can perform gradient-based learning and use it for in-context learning in much the same way as transformers. Specifically, we prove that a single structured state-space model layer, augmented with multiplicative input and output gating, can reproduce the outputs of an implicit linear model with least squares loss after one step of gradient descent. We then show a straightforward extension to multi-step linear and non-linear regression tasks. We validate our construction by training randomly initialized augmented SSMs on linear and non-linear regression tasks. The empirically obtained parameters through optimization match the ones predicted analytically by the theoretical construction. Overall, we elucidate the role of input- and output-gating in recurrent architectures as the key inductive biases for enabling the expressive power typical of foundation models. We also provide novel insights into the relationship between state-space models and linear self-attention, and their ability to learn in-context.

arxiv情報

著者	Neeraj Mohan Sushma,Yudou Tian,Harshvardhan Mestha,Nicolo Colombo,David Kappel,Anand Subramoney
発行日	2025-02-18 18:55:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

State-space models can learn in-context by gradient descent

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー