DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

要約

低ランク適応 (LoRA) は、低ランクの行列で更新を近似することにより、大規模言語モデル (LLM) の微調整に伴う計算量とメモリの需要を削減します。
ただし、2 次元空間での低ランク近似では、ターゲット行列内の高次元構造を捕捉できません。
最近、構造化情報を抽出する機能を活用して、LLM を微調整するためのテンソル分解方法が研究されています。
しかし、これらのアプローチは主にランダムな初期化に依存しており、テンソル適応に対する初期化の影響はまだ調査されていません。
この論文では、ランダムな初期化が完全な微調整によって達成される検証損失から大きく乖離していることを明らかにします。
これに対処するために、LLM の微調整における効果的な初期化のために、事前トレーニングされた重みの行列積演算子 (MPO) 分解を活用する重み分解テンソル適応 (DoTA) を提案します。
さらに、4 ビット量子化用に設計された DoTA の量子化バージョンである QDoTA を紹介します。
常識的タスクと算術推論タスクの実験では、DoTA がより少ないパラメーターでランダムな初期化方法よりも優れたパフォーマンスを発揮することが示されています。
QDoTA はメモリ消費をさらに削減し、常識的な推論タスクで DoTA と同等のパフォーマンスを実現します。
今後の研究をサポートするためにコードを公開します。

要約(オリジナル)

Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices. However, low-rank approximation in two-dimensional space fails to capture high-dimensional structures within the target matrix. Recently, tensor decomposition methods have been explored for fine-tuning LLMs, leveraging their ability to extract structured information. Yet, these approaches primarily rely on random initialization, and the impact of initialization on tensor adaptation remains underexplored. In this paper, we reveal that random initialization significantly diverges from the validation loss achieved by full fine-tuning. To address this, we propose Weight-Decomposed Tensor Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights for effective initialization in fine-tuning LLMs. Additionally, we introduce QDoTA, a quantized version of DoTA designed for 4-bit quantization. Experiments on commonsense and arithmetic reasoning tasks show that DoTA outperforms random initialization methods with fewer parameters. QDoTA further reduces memory consumption and achieves comparable performance to DoTA on commonsense reasoning tasks. We will release our code to support future research.

arxiv情報

著者	Xiaolin Hu,Xiang Cheng,Peiyu Liu,Wei Liu,Jian Luan,Bin Wang,Yong Liu
発行日	2024-12-30 12:00:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー