Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

要約

大規模言語モデル (LLM) は、言語関連のタスクを実行する際に優れた機能を備えていることが証明されています。
ただし、メモリとストレージの要件がかなり大きいため、その導入には大きな課題が生じます。
この問題に対応して、重みのみの量子化、特に 3 ビットと 4 ビットの重みのみの量子化が、最も実行可能な解決策の 1 つとして浮上しました。
ビット数が減少すると、量子化グリッドが広くなり、切り上げと切り捨ての重要性が強調されます。
以前の研究では、摂動を追加して丸めの上下を微調整すると、一部のシナリオで精度が向上することが示されていますが、私たちの研究は、これらの摂動の正確かつ限定された境界によって推進されており、丸め値を変更するためのしきい値のみが
意義。
したがって、重み丸めタスクを最適化するための簡潔で非常に効果的なアプローチを提案します。
SignRound と呼ばれる私たちの手法には、符号付き勾配降下法を使用した軽量のブロック単位の調整が含まれており、400 ステップ以内で優れた結果を達成することができます。
SignRound は、追加の推論オーバーヘッドを導入することなく、最近の方法と見事に競合します。
ソースコードはまもなく \url{https://github.com/intel/neural-compressor} で公開される予定です。

要約(オリジナル)

Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound competes impressively against recent methods without introducing additional inference overhead. The source code will be publicly available at \url{https://github.com/intel/neural-compressor} soon.

arxiv情報

著者	Wenhua Cheng,Weiwei Zhang,Haihao Shen,Yiyang Cai,Xin He,Kaokao Lv
発行日	2023-09-28 09:05:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー