Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

要約

低ランクのアダプターは、大規模言語モデル (LLM) を効率的に微調整するための標準的なアプローチとなっていますが、多くの場合、完全な微調整のパフォーマンスを達成できません。
我々は、慎重に設計された初期化戦略を使用して、低ランク部分空間内の完全な微調整に近似する手法、LoRA Silver Bullet または LoRA-SB を提案します。
他の行列を固定したまま、B と A の間に学習可能な (r x r) 行列を挿入する LoRA-XS のアーキテクチャが、この近似に必要な正確な条件を提供することを理論的に示します。
制約のある更新スペースを活用して、ハイパーパラメーター調整の必要性を排除しながら、高ランクの勾配更新の最適なスケーリングを実現します。
私たちの初期化は初期勾配の最適な低ランク近似を提供し、トレーニング全体を通じて更新方向を保持することを証明します。
数学的推論、常識的推論、および言語理解タスクにわたる広範な実験により、私たちのアプローチは 27 ～ 90 分の 1 のパラメータを使用しながら標準 LoRA のパフォーマンスを上回り、総合的に LoRA-XS よりも優れていることが実証されました。
私たちの調査結果は、低ランクの部分空間で完全な微調整をシミュレートし、パフォーマンスを犠牲にすることなく大幅な効率の向上を達成できることを証明しています。
私たちのコードは https://github.com/RaghavSinghal10/lora-sb で公開されています。

要約(オリジナル)

Low-rank adapters have become a standard approach for efficiently fine-tuning large language models (LLMs), but they often fall short of achieving the performance of full fine-tuning. We propose a method, LoRA Silver Bullet or LoRA-SB, that approximates full fine-tuning within low-rank subspaces using a carefully designed initialization strategy. We theoretically demonstrate that the architecture of LoRA-XS, which inserts a trainable (r x r) matrix between B and A while keeping other matrices fixed, provides the precise conditions needed for this approximation. We leverage its constrained update space to achieve optimal scaling for high-rank gradient updates while removing the need for hyperparameter tuning. We prove that our initialization offers an optimal low-rank approximation of the initial gradient and preserves update directions throughout training. Extensive experiments across mathematical reasoning, commonsense reasoning, and language understanding tasks demonstrate that our approach exceeds the performance of standard LoRA while using 27-90x fewer parameters, and comprehensively outperforms LoRA-XS. Our findings establish that it is possible to simulate full fine-tuning in low-rank subspaces, and achieve significant efficiency gains without sacrificing performance. Our code is publicly available at https://github.com/RaghavSinghal10/lora-sb.

arxiv情報

著者	Kaustubh Ponkshe,Raghav Singhal,Eduard Gorbunov,Alexey Tumanov,Samuel Horvath,Praneeth Vepakomma
発行日	2024-11-29 09:10:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー