LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

要約

少ないトレーニングリソースで大規模な言語モデルのコンテキスト長を拡張する効率的かつ効果的な方法である LongQLoRA を紹介します。
LongQLoRA は、位置補間、QLoRA、および LongLoRA のシフトショートアテンションの利点を組み合わせています。
単一の 32GB V100 GPU を使用すると、LongQLoRA は、LLaMA2 7B および 13B のコンテキスト長を 1000 微調整ステップ内で 4096 から 8192、さらには 12k まで拡張できます。
LongQLoRA は、PG19 およびプルーフパイルデータセットで競争力のあるパープレキシティパフォーマンスを達成しており、私たちのモデルは LongLoRA を上回っており、8192 の評価コンテキスト長内で MPT-7B-8K に非常に近いです。Vicuna のコンテキスト長を拡張するために、39k の長さの命令データを収集して構築します。
13B は 4096 から 8192 であり、長いコンテキスト生成タスクと短いコンテキスト生成タスクの両方で良好なパフォーマンスを達成します。
また、LoRA ランク、微調整ステップ、および推論における注意パターンの影響を研究するために、いくつかのアブレーション実験も行っています。モデルの重み、トレーニングデータ、およびコードは、https://github.com/yangjianxin1/LongQLoRA で入手できます。

要約(オリジナル)

We present LongQLoRA, an efficient and effective method to extend context length of large language models with less training resources. LongQLoRA combines the advantages of Position Interpolation, QLoRA and Shift Short Attention of LongLoRA. With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps. LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets, our model outperforms LongLoRA and is very close to MPT-7B-8K within the evaluation context length of 8192. We collect and build 39k long instruction data to extend context length of Vicuna-13B from 4096 to 8192 and achieve good performance both in long and short context generation task. We also do some ablation experiments to study the effect of LoRA rank, finetuning steps and attention patterns in inference.The model weights, training data and code are avaliable at https://github.com/yangjianxin1/LongQLoRA.

arxiv情報

著者	Jianxin Yang
発行日	2023-11-08 18:33:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー