DiSHA: Dimension-Sharding Adaptation of Large Language Models with Fast Convergence and Fast Computation

要約

パラメーター効率の高い微調整（PEFT）のフレームワーク内で顕著な手法である低ランク適応（LORA）は、大規模な言語モデル（LLMS）をダウンストリームタスクに適応させることに関連する計算負担を効率的に減らし、それによりリソースに制約のある微細を有効にします。
チューニング。
しかし、既存の研究では、Loraが収束が遅いことを示しています。
この制限に対処するために、Dimension-Sharding適応（Disha）を導入します。これにより、PEFT設計スペースをさらに少ないトレーニング可能なパラメーターとより速い収束に拡大します。
Dishaの設計スペース内で、Block Affine Efficient Computation（Bone）を提案します。これは、高性能と効率の両方を提供する計算効率の高い構造です。
特定のDishaの構成により、重量シャードが共同更新される可能性がありますが、Dishaの非線形バリアントであるBlock Affine Transformation（BAT）でこれに対処します。
BATは、トレーニング可能なマトリックスと元の重量シャードと非線形的な方法で組み合わせることにより、非線形性を導入し、追加のパラメーターを導入せずにマトリックス更新に非線形性を誘導します。
経験的な結果は、Dishaフレームワークの下で骨が自然言語の理解と自然言語生成の両方のタスクのロラ変異体を一貫して上回ることを示しており、計算効率が大幅に改善されていることを示しています。
さらなる分析により、BATは非線形設計を活用することによりモデル機能を強化することが示されています。

要約(オリジナル)

Low-Rank Adaptation (LoRA), a prominent technique within the framework of Parameter-Efficient Fine-Tuning (PEFT), efficiently reduces the computational burden associated with adapting Large Language Models (LLMs) to downstream tasks, thereby enabling resource-constrained fine-tuning. However, existing researches have shown that LoRA suffers from slow convergence. To address this limitation, we introduce Dimension-Sharding Adaptation (DiSHA), which expands the PEFT design space to even fewer trainable parameters and faster convergence. Within DiSHA’s design space, we propose Block Affine Efficient Computation (Bone), a computationally efficient structure that delivers both high performance and efficiency. While certain DiSHA configurations may result in colinear updates to weight shards, we address this with Block Affine Transformation (Bat), a nonlinear variant of DiSHA. Bat introduces nonlinearity by combining trainable matrices with original weight shards in a nonlinear manner, inducing nonlinearity in matrix updates without introducing additional parameters. Empirical results show that Bone, under the DiSHA framework, consistently outperforms LoRA variants in both Natural Language Understanding and Natural Language Generation tasks, with significantly improved computational efficiency. Further analysis demonstrates that BAT enhances model capabilities by leveraging its nonlinear design.

arxiv情報

著者	Jiale Kang
発行日	2025-01-28 09:15:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DiSHA: Dimension-Sharding Adaptation of Large Language Models with Fast Convergence and Fast Computation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー