SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

要約

大規模な言語モデルのサイズが拡大し続けるにつれて、それを実行するために必要な計算リソースも拡大します。
スパイキングニューラルネットワーク (SNN) は、エネルギー効率の高いディープラーニングへのアプローチとして登場しました。これは、スパースなイベント駆動型アクティベーションを活用して、モデルの推論に関連する計算オーバーヘッドを削減します。
SNN は、多くのコンピュータービジョンタスクで非スパイクモデルと競合するようになりましたが、トレーニングがより困難であることも証明されています。
その結果、それらのパフォーマンスは最新の深層学習に遅れをとっており、言語生成における SNN の有効性はまだ確認されていません。
この論文では、RWKV 言語モデルに着想を得て、純粋なバイナリのイベント駆動型スパイキングアクティベーションユニットを備えた生成言語モデルである「SpikeGPT」の実装に成功しました。
提案されたモデルを、45M、125M、および 260M パラメーターの 3 つのモデルバリアントでトレーニングします。
私たちの知る限りでは、これはこれまでの機能的なバックプロップトレーニング済み SNN の 4 倍です。
これは、トランスフォーマーブロックを変更してマルチヘッドセルフアテンションを置き換え、二次計算の複雑さをシーケンス長の増加に伴う線形に軽減することによって実現します。
代わりに、入力トークンはアテンションメカニズムに順次ストリーミングされます (典型的な SNN と同様)。
私たちの予備実験では、SpikeGPT は、テストされたベンチマークで非スパイクモデルとの競争力を維持しながら、まばらなイベント駆動型アクティベーションを活用できるニューロモーフィックハードウェアで処理した場合、5 分の 1 のエネルギー消費を維持することが示されています。
コードの実装は、https://github.com/ridgerchu/SpikeGPT で入手できます。

要約(オリジナル)

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the RWKV language model, we successfully implement `SpikeGPT’, a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

arxiv情報

著者	Rui-Jie Zhu,Qihang Zhao,Jason K. Eshraghian
発行日	2023-02-28 06:28:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー