Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private Tuning

要約

自然言語処理タスクを解決するために、事前トレーニングされた言語変換器の効率的な微調整がますます普及してきています。
効果的ではありますが、それでも多数の調整可能なパラメーターが必要になる場合があります。
これは、微調整中に過剰なノイズが発生する可能性がある、低リソースのアプリケーションや差分プライバシー制約のあるトレーニングでは欠点になる可能性があります。
この目的を達成するために、複数のトランスフォーマー層にタスク固有のパラメーターを導入する新しい言語トランスフォーマー微調整戦略を提案します。
これらのパラメーターは単一のトレーニング可能なベクトルの固定ランダム投影から導出され、パフォーマンスを維持しながら大幅に少ないパラメーターで微調整が可能になります。
タスクあたりわずか 4,100 個のパラメーターを使用する GLUE タスクでは、完全な微調整パフォーマンスの 5% 以内を達成し、タスクごとに同様の数のパラメーターを使用する他のパラメーター効率の高い微調整アプローチを上回ります。
さらに、ランダムな投影は推論時に事前計算できるため、追加の計算遅延を回避できます。
これらすべてにより、私たちの方法は低リソースのアプリケーションにとって特に魅力的になります。
最後に、私たちの方法は、同じプライバシー制約でトレーニングする場合、最近のいくつかの微調整方法と比較して最高または同等の有用性を達成し、その有効性と潜在的な現実世界への影響を強調しています。

要約(オリジナル)

Efficient finetuning of pretrained language transformers is becoming increasingly prevalent for solving natural language processing tasks. While effective, it can still require a large number of tunable parameters. This can be a drawback for low-resource applications and training with differential-privacy constraints, where excessive noise may be introduced during finetuning. To this end, we propose a novel language transformer finetuning strategy that introduces task-specific parameters in multiple transformer layers. These parameters are derived from fixed random projections of a single trainable vector, enabling finetuning with significantly fewer parameters while maintaining performance. We achieve within 5% of full finetuning performance on GLUE tasks with as few as 4,100 parameters per task, outperforming other parameter-efficient finetuning approaches that use a similar number of per-task parameters. Besides, the random projections can be precomputed at inference, avoiding additional computational latency. All these make our method particularly appealing for low-resource applications. Finally, our method achieves the best or comparable utility compared to several recent finetuning methods when training with the same privacy constraints, underscoring its effectiveness and potential real-world impact.

arxiv情報

著者	Umang Gupta,Aram Galstyan,Greg Ver Steeg
発行日	2023-05-30 17:55:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー