The Super Weight in Large Language Models

要約

最近の研究では、驚くべき結果が示されました。それは、大規模言語モデル (LLM) パラメーターの外れ値のごく一部が、モデルの品質にとって不釣り合いに重要であるということです。
LLM には数十億のパラメータが含まれているため、0.01% などの小さな割合は数十万のパラメータに変換されます。
この研究では、さらに驚くべき発見を示します。パラメータを 1 つでもプルーニングすると、LLM のテキスト生成機能が破壊される可能性があります。つまり、困惑度が 3 桁増加し、ゼロショットの推測精度が低下します。
我々は、モデルを通過する単一の前方パスを使用して、スーパーウェイトと呼ばれるそのようなパラメーターを識別するためのデータフリーの方法を提案します。
さらに、これらのスーパーウェイトが、スーパーアクティベーションと呼ばれる、まれで大きなアクティベーション外れ値を相応に誘発することもわかりました。
スーパーアクティベーションを高精度で保存すると、単純な最近値への丸め量子化が改善され、最先端の方法と競合できるようになります。
重み量子化についても、同様に、スーパー重みを保持し、他の重み外れ値をクリップすることにより、最も近い値への丸め量子化が、以前に考えられていたよりもはるかに大きなブロックサイズにスケールできることがわかりました。
スーパーウェイトのさらなる研究を促進するために、一般的に公開されている LLM のスーパーウェイト座標のインデックスを提供します。

要約(オリジナル)

Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such as 0.01%, translate to hundreds of thousands of parameters. In this work, we present an even more surprising finding: Pruning as few as a single parameter can destroy an LLM’s ability to generate text — increasing perplexity by 3 orders of magnitude and reducing zero-shot accuracy to guessing. We propose a data-free method for identifying such parameters, termed super weights, using a single forward pass through the model. We additionally find that these super weights induce correspondingly rare and large activation outliers, termed super activations. When preserved with high precision, super activations can improve simple round-to-nearest quantization to become competitive with state-of-the-art methods. For weight quantization, we similarly find that by preserving the super weight and clipping other weight outliers, round-to-nearest quantization can scale to much larger block sizes than previously considered. To facilitate further research into super weights, we provide an index of super weight coordinates for common, openly available LLMs.

arxiv情報

著者	Mengxia Yu,De Wang,Qi Shan,Colorado Reed,Alvin Wan
発行日	2024-11-11 18:05:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Super Weight in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー