ApiQ: Finetuning of 2-Bit Quantized Large Language Model

要約

LLM のサイズが増大するにつれて、メモリ効率の高い大規模言語モデル (LLM) の微調整が最近大きな注目を集めています。これは、主に GPU メモリの制限によってもたらされる制約と、これらの方法を完全な微調整で同等の結果が得られるためです。
進歩にもかかわらず、QLoRA などのメモリ効率の高い微調整のための現在の戦略は、多様なビット幅の量子化や多面的なタスクにわたって一貫性のないパフォーマンスを示します。
この不一致は主に、保存された知識に対する量子化プロセスの悪影響に起因しており、壊滅的な忘却につながり、微調整目的での事前トレーニング済みモデルの利用が損なわれます。
この研究では、LoRA コンポーネントの初期化と LLM の重みの量子化を同時に行うことで、量子化で失われた情報を復元するように設計された、ApiQ という名前の新しい量子化フレームワークを導入します。
このアプローチにより、浅い層から深い層へのエラーの伝播を軽減しながら、元の LLM のアクティベーション精度が確実に維持されます。
さまざまなモデルを使用してさまざまな言語タスクに対して実行される包括的な評価を通じて、ApiQ は量子化中のアクティベーションエラーを最小限に抑えることが実証されています。
その結果、さまざまな量子化ビット幅にわたって優れた微調整結果が一貫して達成されます。

要約(オリジナル)

Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the comparable results of these methods with full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across diverse bit-width quantizations and multifaceted tasks. This inconsistency largely stems from the detrimental impact of the quantization process on preserved knowledge, leading to catastrophic forgetting and undermining the utilization of pretrained models for finetuning purposes. In this work, we introduce a novel quantization framework named ApiQ, designed to restore the lost information from quantization by concurrently initializing LoRA components and quantizing the weights of LLMs. This approach ensures the maintenance of the original LLM’s activation precision while mitigating the error propagation from shallower into deeper layers. Through comprehensive evaluations conducted on a spectrum of language tasks with various models, ApiQ demonstrably minimizes activation error during quantization. Consequently, it consistently achieves superior finetuning outcomes across various bit-widths of quantization.

arxiv情報

著者	Baohao Liao,Christof Monz
発行日	2024-02-12 15:09:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ApiQ: Finetuning of 2-Bit Quantized Large Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー