Activated LoRA: Fine-tuned LLMs for Intrinsics

要約

低ランク適応（LORA）は、大規模な基礎モデルの重みを微調整するための非常に効率的なフレームワークとして浮上しており、LLMSのデータ駆動型カスタマイズの頼りになる方法となっています。
高度にカスタマイズされた動作と能力の約束にもかかわらず、ターン履歴全体のキー価値（kV）キャッシュは、世代を開始する前にロラの重みで再計算する必要があるため、多留環境で関連するロラを切り替えることは非常に非効率的です。
この問題に対処するために、アクティブ化されたLora（Alora）を提案します。これにより、Loraフレームワークを変更して、Aloraが呼び出されたシーケンス\ Emphing {after}のトークンの重みのみを適応させます。
この変更により、アロラは入力文字列の基本モデルのKVキャッシュを受け入れることができます。つまり、キャッシュを再計算せずにチェーンで必要なときはいつでもアロラを即座にアクティブにすることができます。
これにより、私たちが呼ぶものを構築することができます。つまり、デフォルトで基本モデルを使用する入力チェーンまたは会話の一部で明確に定義された操作を実行するために呼び出された高度に専門化されたモデルを呼び出します。
Aloraを使用して、一連の内在性モデルを訓練し、標準のLORAとの競争精度を実証しながら、重要な推論上の利点を達成します。

要約(オリジナル)

Low-Rank Adaptation (LoRA) has emerged as a highly efficient framework for finetuning the weights of large foundation models, and has become the go-to method for data-driven customization of LLMs. Despite the promise of highly customized behaviors and capabilities, switching between relevant LoRAs in a multiturn setting is highly inefficient, as the key-value (KV) cache of the entire turn history must be recomputed with the LoRA weights before generation can begin. To address this problem, we propose Activated LoRA (aLoRA), which modifies the LoRA framework to only adapt weights for the tokens in the sequence \emph{after} the aLoRA is invoked. This change crucially allows aLoRA to accept the base model’s KV cache of the input string, meaning that aLoRA can be instantly activated whenever needed in a chain without recomputing the cache. This enables building what we call \emph{intrinsics}, i.e. highly specialized models invoked to perform well-defined operations on portions of an input chain or conversation that otherwise uses the base model by default. We use aLoRA to train a set of intrinsics models, demonstrating competitive accuracy with standard LoRA while achieving significant inference benefits.

arxiv情報

著者	Kristjan Greenewald,Luis Lastras,Thomas Parnell,Vraj Shah,Lucian Popa,Giulio Zizzo,Chulaka Gunasekara,Ambrish Rawat,David Cox
発行日	2025-04-29 14:25:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Activated LoRA: Fine-tuned LLMs for Intrinsics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー