One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

要約

基盤モデル (FM) は大規模なデータセットで事前トレーニングされ、特定のアプリケーションの下流タスクで微調整されます。
最も成功し、最も一般的に使用される微調整方法は、低ランク適応 (LoRA) を介して事前トレーニングされた重みを更新することです。
LoRA では、通常、モデルの重み全体に均一なランク分布を使用してランダムに初期化される新しい重み行列が導入されています。
最近の研究は、トレーニング中の重み主導の初期化または適応ランクの学習に焦点を当てています。
どちらのアプローチも単独でしか調査されていないため、収束が遅くなったり、均一なランク分布が生じたりして、次善のパフォーマンスにつながります。
活性化ベクトルのミニバッチで特異値分解を計算することにより、データ駆動型の方法で新しい重みを初期化することにより、LoRA を強化することを提案します。
次に、取得した右特異ベクトルで LoRA 行列を初期化し、すべての重み行列間でランクを再配分して分散の最大量を説明し、標準的な LoRA 微調整手順を続行します。
これにより、新しい手法である Explained Variance Adaptation (EVA) が誕生しました。
私たちは EVA を、言語の生成や理解から画像分類や強化学習に至るまで、さまざまな微調整タスクに適用します。
EVA は競合他社よりも速い収束を示し、ドメインごとの多数のタスクにわたって最高の平均スコアを達成します。

要約(オリジナル)

Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are usually initialized at random with a uniform rank distribution across model weights. Recent works focus on weight-driven initialization or learning of adaptive ranks during training. Both approaches have only been investigated in isolation, resulting in slow convergence or a uniform rank distribution, in turn leading to sub-optimal performance. We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure. This results in our new method Explained Variance Adaptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.

arxiv情報

著者	Fabian Paischer,Lukas Hauzenberger,Thomas Schmied,Benedikt Alkin,Marc Peter Deisenroth,Sepp Hochreiter
発行日	2024-10-09 17:59:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー