PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

要約

最近、トランスフォーマーの規模が急速に拡大しており、タスク適応の範囲におけるトレーニングのオーバーヘッドと推論効率の点でかなりの課題が生じています。
既存の研究、つまりパラメータ効率の良い微調整 (PEFT) とモデル圧縮では、この課題を個別に調査してきました。
ただし、PEFT は、特に大規模モデルの場合、元のバックボーンの推論効率を保証できません。
モデル圧縮には、構造の検索と再トレーニングに多大なトレーニングコストが必要です。
したがって、それらを単純に組み合わせただけでは、最小限のコストでトレーニング効率と推論効率の両方を達成することを保証できません。
この論文では、トレーニングと推論の効率的なタスク適応という課題に対して、新しい Parallel Yielding Re-Activation (PYRA) 手法を提案します。
PYRA はまず、並列生成適応重みを利用して、下流タスクのデータ分布を包括的に認識します。
その後、トークン変調の再アクティブ化戦略が結合されるトークンに適用され、調整されたトークン機能が実現されます。
広範な実験により、PYRA が低圧縮率と高圧縮率の両方で競合するすべての手法よりも優れたパフォーマンスを発揮することが実証され、大規模な基礎モデルのトレーニング効率と推論効率の両方を維持する上での有効性と優位性が実証されました。
コードは https://github.com/THU-MIG/PYRA で入手できます。

要約(オリジナル)

Recently, the scale of transformers has grown rapidly, which introduces considerable challenges in terms of training overhead and inference efficiency in the scope of task adaptation. Existing works, namely Parameter-Efficient Fine-Tuning (PEFT) and model compression, have separately investigated the challenges. However, PEFT cannot guarantee the inference efficiency of the original backbone, especially for large-scale models. Model compression requires significant training costs for structure searching and re-training. Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs. In this paper, we propose a novel Parallel Yielding Re-Activation (PYRA) method for such a challenge of training-inference efficient task adaptation. PYRA first utilizes parallel yielding adaptive weights to comprehensively perceive the data distribution in downstream tasks. A re-activation strategy for token modulation is then applied for tokens to be merged, leading to calibrated token features. Extensive experiments demonstrate that PYRA outperforms all competing methods under both low compression rate and high compression rate, demonstrating its effectiveness and superiority in maintaining both training efficiency and inference efficiency for large-scale foundation models. Our code is available at https://github.com/THU-MIG/PYRA.

arxiv情報

著者	Yizhe Xiong,Hui Chen,Tianxiang Hao,Zijia Lin,Jungong Han,Yuesong Zhang,Guoxin Wang,Yongjun Bao,Guiguang Ding
発行日	2024-07-16 14:34:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー