Divergent Token Metrics: Measuring degradation to prune away LLM components — and optimize quantization

要約

大規模言語モデル（LLM）は、その素晴らしい能力によって自然言語処理を再形成してきた。しかし、LLMのサイズは増加の一途をたどっており、LLMの効果的な導入や圧縮の必要性が懸念されている。本研究では、圧縮されたLLMを評価するための新しいアプローチであるDTM（Divergent Token Metrics）を導入し、テキスト生成の品質を正確に反映できない従来の複雑度や精度測定の限界に対処する。DTMはトークンの発散を測定することで、特にコンポーネントの影響を個別に評価する場合に、モデル圧縮の微妙な差異をより深く理解することができます。モデルのスパース化にFDTM（First Divergent Token Metric）を利用することで、Llama-2モデルファミリにおいて、SOTA性能を維持したまま、全注目成分の25%を90%以上に削減できることが明らかになった。量子化については、FDTMは、80%以上のパラメータが、特別な異常値管理なしに、素朴にint8に変換できることを示唆している。これらの評価は、標準的なメトリクスの結果が悪化する一方で、パラメータに対して個々に適切な圧縮を選択する必要性、そしてFDTMがそれらを識別できることを示している。

要約(オリジナル)

Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduces the Divergent Token Metrics (DTMs), a novel approach to assessing compressed LLMs, addressing the limitations of traditional perplexity or accuracy measures that fail to accurately reflect text generation quality. DTMs measure token divergences that allow deeper insights into the subtleties of model compression, in particular, when evaluating components’ impacts individually. Utilizing the First Divergent Token Metric (FDTM) in model sparsification reveals that 25% of all attention components can be pruned beyond 90% on the Llama-2 model family, still keeping SOTA performance. For quantization, FDTM suggests that more than 80% of parameters can be naively transformed to int8 without special outlier management. These evaluations indicate the necessity of choosing appropriate compressions for parameters individually — and that FDTM can identify those — while standard metrics result in deteriorated outcomes.

arxiv情報

著者	Björn Deiseroth,Max Meuer,Nikolas Gritsch,Constantin Eichenberg,Patrick Schramowski,Matthias Aßenmacher,Kristian Kersting
発行日	2024-04-03 11:49:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Divergent Token Metrics: Measuring degradation to prune away LLM components — and optimize quantization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー