Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm

要約

Parameter-Efficient Transfer Learning (PETL) は、大量のデータで事前トレーニングされた大規模なモデルを、タスク固有のデータが限られているダウンストリームタスクに効率的に適応させることを目的としています。
PETL の実用性を考慮して、以前の研究では、事前トレーニングタスクとダウンストリームタスク間のタスク分散シフトの問題をほとんど考慮せずに、エンドツーエンドの方法で各ダウンストリームタスクの小さなパラメーターセットを調整することに焦点を当てていました。
このホワイトペーパーでは、事前にトレーニングされたモデルが最初にターゲット分布に合わせられる、新しい 2 段階のパラダイムを提案します。
次に、タスク関連の情報が効果的な適応のために活用されます。
具体的には、第 1 段階では、LayerNorm レイヤーのスケールとシフトを調整することで、タスクの分散シフトを狭めます。
第 2 段階では、タスク関連情報を効率的に学習するために、テイラー展開ベースの重要度スコアを提案して、下流タスクのタスク関連チャネルを識別し、チャネルのごく一部のみを調整して、適応をパラメータにします。
-効率的。
全体として、PETL の有望な新しい方向性を提示し、提案されたパラダイムは、19 のダウンストリームタスクの平均精度で最先端のパフォーマンスを達成します。

要約(オリジナル)

Parameter-Efficient Transfer Learning (PETL) aims at efficiently adapting large models pre-trained on massive data to downstream tasks with limited task-specific data. In view of the practicality of PETL, previous works focus on tuning a small set of parameters for each downstream task in an end-to-end manner while rarely considering the task distribution shift issue between the pre-training task and the downstream task. This paper proposes a novel two-stage paradigm, where the pre-trained model is first aligned to the target distribution. Then the task-relevant information is leveraged for effective adaptation. Specifically, the first stage narrows the task distribution shift by tuning the scale and shift in the LayerNorm layers. In the second stage, to efficiently learn the task-relevant information, we propose a Taylor expansion-based importance score to identify task-relevant channels for the downstream task and then only tune such a small portion of channels, making the adaptation to be parameter-efficient. Overall, we present a promising new direction for PETL, and the proposed paradigm achieves state-of-the-art performance on the average accuracy of 19 downstream tasks.

arxiv情報

著者	Hengyuan Zhao,Hao Luo,Yuyang Zhao,Pichao Wang,Fan Wang,Mike Zheng Shou
発行日	2023-03-14 13:50:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー