Linear Transformers with Learnable Kernel Functions are Better In-Context Models

要約

言語モデル (LM) の二次二次アーキテクチャのフロンティアを前進させることは、急速に進化する自然言語処理の分野において非常に重要です。
状態空間モデルを含む現在のイノベーションは、当初、言語モデリングタスクで Transformer のパフォーマンスを上回ったとして称賛されました。
しかし、これらのモデルでは、Transformer が伝統的に優れている領域である、重要なインコンテキスト学習機能に欠陥があることが明らかになりました。
Based モデルは、畳み込みネットワークによって強化された、指数関数のテイラー展開に触発されたカーネルと線形変換器をブレンドしたハイブリッドソリューションとして登場しました。
トランスフォーマーの状況に応じた熟練度を反映して、この分野では有力な候補となりました。
私たちの研究では、Pile データセットで実証されているように、Multi-Query Associative Recall タスクと全体的な言語モデリングプロセスで評価された In-Context Learning 能力を増幅する、Based カーネルに対する特異でエレガントな変更を提示します。

要約(オリジナル)

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities – a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer’s in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.

arxiv情報

著者	Yaroslav Aksenov,Nikita Balagansky,Sofia Maria Lo Cicero Vaina,Boris Shaposhnikov,Alexey Gorbatovski,Daniil Gavrilov
発行日	2024-06-05 14:13:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー