BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models

要約

本研究では、Adamを内部ソルバーとするブロック座標最適化フレームワークを活用したオプティマイザーBAdamを紹介する。BAdamは、大規模言語モデルの全パラメータ微調整に対するメモリ効率の良いアプローチを提供し、連鎖則特性により後方プロセスの実行時間を短縮する。実験的に、単一のRTX3090-24GB GPUを使用して、Alpaca-GPT4データセット上のLlama 2-7Bモデルの命令チューニングにBAdamを適用しました。その結果、BAdamはLoRAやLOMOと比較して優れた収束挙動を示すことがわかりました。さらに、MTベンチを使用した命令チューニングモデルのダウンストリーム性能評価では、BAdamがLoRAをわずかに上回り、LOMOをより大幅に上回ることが示されました。最後に、中規模タスク、すなわちSuperGLUEベンチマークのRoBERTa-largeのファインチューニングについて、BAdamとAdamを比較します。その結果、BAdamはAdamとの性能差を縮めることができることが実証された。我々のコードはhttps://github.com/Ledzy/BAdam。

要約(オリジナル)

This work presents BAdam, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver. BAdam offers a memory efficient approach to the full parameter finetuning of large language models and reduces running time of the backward process thanks to the chain rule property. Experimentally, we apply BAdam to instruction-tune the Llama 2-7B model on the Alpaca-GPT4 dataset using a single RTX3090-24GB GPU. The results indicate that BAdam exhibits superior convergence behavior in comparison to LoRA and LOMO. Furthermore, our downstream performance evaluation of the instruction-tuned models using the MT-bench shows that BAdam modestly surpasses LoRA and more substantially outperforms LOMO. Finally, we compare BAdam with Adam on a medium-sized task, i.e., finetuning RoBERTa-large on the SuperGLUE benchmark. The results demonstrate that BAdam is capable of narrowing the performance gap with Adam. Our code is available at https://github.com/Ledzy/BAdam.

arxiv情報

著者	Qijun Luo,Hengxu Yu,Xiao Li
発行日	2024-04-03 15:59:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー