State Soup: In-Context Skill Learning, Retrieval and Mixing

要約

新しい種類のゲート線形リカレントニューラルネットワークが、一連のシーケンスモデリング問題に関して最先端のパフォーマンスに到達しました。
新しい入力を処理するコストはシーケンスの長さに依存しないため、このようなモデルは当然、長いシーケンスを効率的に処理します。
ここでは、パラメーター補間によるモデルのマージの成功にヒントを得て、これらのステートフルシーケンスモデルの別の利点を探ります。
私たちは、微調整とコンテキスト内学習の類似点に基づいて、再帰の線形性を利用して、保存、取得、線形結合が可能なタスクベクトルとして内部状態を扱えるかどうかを調査します。
我々は、事前トレーニング済み再帰モデルである Mamba-2.8b 上でこの形式の高速モデルマージを研究し、単純な線形状態補間法がネクストトークンの複雑さおよび下流のコンテキスト内学習タスクのパフォーマンスを向上させるのに十分であるという予備的な証拠を提示します。

要約(オリジナル)

A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost of processing a new input is independent of sequence length. Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter interpolation. Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined, exploiting the linearity of recurrence. We study this form of fast model merging on Mamba-2.8b, a pretrained recurrent model, and present preliminary evidence that simple linear state interpolation methods suffice to improve next-token perplexity as well as downstream in-context learning task performance.

arxiv情報

著者	Maciej Pióro,Maciej Wołczyk,Razvan Pascanu,Johannes von Oswald,João Sacramento
発行日	2024-06-12 17:06:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

State Soup: In-Context Skill Learning, Retrieval and Mixing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー