SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

要約

近年、数十億ものパラメータを持つ大規模言語モデル（LLM）が進歩し、様々なアプリケーションで性能が向上しているが、その推論処理には多大なエネルギーと計算資源が必要である。一方、約860億個のニューロンを持つ人間の脳は、同様のパラメータを持つLLMよりもはるかにエネルギー効率が高い。これにヒントを得て、我々は、人間の脳の効率的な振る舞いをエミュレートする、生物学的に実現可能なスパイキング機構を用いて、700億パラメータのLLMを再設計する。我々は最初のスパイク大規模言語モデルSpikeLLMを提案する。提案モデルと結合して、スパイク学習効率を向上させる2つの本質的なアプローチを提案する：一般化Integrate-and-Fire (GIF)ニューロンにより、スパイク長を$T$から$log_2 L$に圧縮する。\GIFニューロンは、スパイク長を$T$から$log_2 L$ビットまで圧縮する。また、最適脳スパイクフレームワークにより、異常値チャンネルを分割し、異なる$T$をGIFニューロンに割り当てる。スパイク駆動LLMの必要性は、同様の演算を行う量子化LLMとの比較によって証明される。OmniQuantパイプラインにおいて、SpikeLLMはWikiText2のプレプレキシティを11.01%削減し、LLAMA-7B W4A4モデルの一般的なシーン推論の精度を2.55%向上させました。GPTQパイプラインでは、SpikeLLMは線形レイヤで直接加算を達成し、PB-LLMを大幅に上回ります。

要約(オリジナル)

Recent advancements in large language models (LLMs) with billions of parameters have improved performance in various applications, but their inference processes demand significant energy and computational resources. In contrast, the human brain, with approximately 86 billion neurons, is much more energy-efficient than LLMs with similar parameters. Inspired by this, we redesign 7$\sim$70 billion parameter LLMs using bio-plausible spiking mechanisms, emulating the efficient behavior of the human brain. We propose the first spiking large language model, SpikeLLM. Coupled with the proposed model, two essential approaches are proposed to improve spike training efficiency: Generalized Integrate-and-Fire (GIF) neurons to compress spike length from $T$ to $\frac{T}{L} \log_2 L$ bits, and an Optimal Brain Spiking framework to divide outlier channels and allocate different $T$ for GIF neurons, which further compresses spike length to approximate $log_2T$ bits. The necessity of spike-driven LLM is proved by comparison with quantized LLMs with similar operations. In the OmniQuant pipeline, SpikeLLM reduces 11.01% WikiText2 perplexity and improves 2.55% accuracy of common scene reasoning on a LLAMA-7B W4A4 model. In the GPTQ pipeline, SpikeLLM achieves direct additive in linear layers, significantly exceeding PB-LLMs.

arxiv情報

著者	Xingrun Xing,Boyan Gao,Zheng Zhang,David A. Clifton,Shitao Xiao,Li Du,Guoqi Li,Jiajun Zhang
発行日	2025-03-03 06:46:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー