ApiQ: Finetuning of 2-Bit Quantized Large Language Model

要約

LLM のサイズが増大するにつれて、メモリ効率の高い大規模言語モデル (LLM) の微調整が最近大きな注目を集めています。これは主に GPU メモリの制限によってもたらされる制約と、完全な微調整と比較した場合のこれらの方法の有効性によるものです。
進歩にもかかわらず、QLoRA などのメモリ効率の高い微調整のための現在の戦略は、多様なビット幅の量子化や多面的なタスクにわたって一貫性のないパフォーマンスを示します。
この不一致は主に、保存された知識に対する量子化プロセスの悪影響に起因しており、壊滅的な忘却につながり、微調整目的での事前トレーニング済みモデルの利用が損なわれます。
この研究では、LoRA コンポーネントの初期化と LLM の重みの量子化を同時に行うことで、量子化で失われた情報を復元するように設計された新しい量子化フレームワーク ApiQ を紹介します。
このアプローチにより、浅い層から深い層へのエラーの伝播を軽減しながら、元の LLM のアクティベーション精度が確実に維持されます。
さまざまな LLM を使用してさまざまな言語タスクに対して実行される包括的な評価を通じて、ApiQ は量子化中のアクティベーションエラーを最小限に抑えることが実証されています。
その結果、さまざまなビット幅にわたって一貫して優れた微調整結果が得られます。

要約(オリジナル)

Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across diverse bit-width quantizations and multifaceted tasks. This inconsistency largely stems from the detrimental impact of the quantization process on preserved knowledge, leading to catastrophic forgetting and undermining the utilization of pretrained models for finetuning purposes. In this work, we introduce a novel quantization framework, ApiQ, designed to restore the lost information from quantization by concurrently initializing the LoRA components and quantizing the weights of LLMs. This approach ensures the maintenance of the original LLM’s activation precision while mitigating the error propagation from shallower into deeper layers. Through comprehensive evaluations conducted on a spectrum of language tasks with various LLMs, ApiQ demonstrably minimizes activation error during quantization. Consequently, it consistently achieves superior finetuning results across various bit-widths.

arxiv情報

著者	Baohao Liao,Christian Herold,Shahram Khadivi,Christof Monz
発行日	2024-06-21 14:03:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ApiQ: Finetuning of 2-Bit Quantized Large Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー