READ: Recurrent Adaptation of Large Transformers

要約

大規模な Transformers を微調整することで、自然言語処理やコンピュータービジョンのタスク全体で多くの AI アプリケーションが爆発的に増加しました。
ただし、モデルのサイズとタスクの数が増加するにつれて、事前トレーニングされたモデルのパラメーターをすべて微調整することは現実的ではなくなります。
パラメータ効率の高い転移学習 (PETL) 手法は、これらの課題に対処することを目的としています。
PETL 手法はトレーニング可能なパラメータの数を減らすのには効果的ですが、微調整には依然として多量のエネルギーと計算リソースが必要です。
このペーパーでは、現在の PETL アプローチの制限を克服するために、軽量でメモリ効率の高い微調整方法である \textbf{RE}current \textbf{AD}aption (READ) を紹介します。
具体的には、READ は、モデルが大規模なバックボーンネットワークを介して逆伝播する必要がないように、バックボーンモデルの横に小規模な RNN ネットワークを挿入します。
GLUE ベンチマークの包括的な実証評価を通じて、フルチューニングと比較して、READ が高いモデル品質を再トレーニングしながら、トレーニングメモリ消費量を $56\%$ 削減し、GPU エネルギー使用量を $84\%$ 削減できることを実証しました。
さらに、READ のモデルサイズはバックボーンモデルのサイズとともに増加しないため、大規模なトランスフォーマーを微調整するための拡張性の高いソリューションになります。

要約(オリジナル)

Fine-tuning large-scale Transformers has led to the explosion of many AI applications across Natural Language Processing and Computer Vision tasks. However, fine-tuning all pre-trained model parameters becomes impractical as the model size and number of tasks increase. Parameter-efficient transfer learning (PETL) methods aim to address these challenges. While effective in reducing the number of trainable parameters, PETL methods still require significant energy and computational resources to fine-tune. In this paper, we introduce \textbf{RE}current \textbf{AD}aption (READ) — a lightweight and memory-efficient fine-tuning method — to overcome the limitations of the current PETL approaches. Specifically, READ inserts a small RNN network alongside the backbone model so that the model does not have to back-propagate through the large backbone network. Through comprehensive empirical evaluation of the GLUE benchmark, we demonstrate READ can achieve a $56\%$ reduction in the training memory consumption and an $84\%$ reduction in the GPU energy usage while retraining high model quality compared to full-tuning. Additionally, the model size of READ does not grow with the backbone model size, making it a highly scalable solution for fine-tuning large Transformers.

arxiv情報

著者	Sid Wang,John Nguyen,Ke Li,Carole-Jean Wu
発行日	2023-05-24 16:59:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

READ: Recurrent Adaptation of Large Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー