Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning

要約

最近、ビジョントランスフォーマー (ViT) に基づく基礎モデルが広く利用できるようになりました。
ただし、その微調整プロセスはリソースを大量に消費するため、いくつかのエッジアプリケーションや低エネルギーアプリケーションでの採用が妨げられています。
この目的を達成するために、この論文では、微調整プロセスを高速化する $\textbf{ALaST}$ ($\textit{ビジョントランスフォーマーの適応層選択微調整}$) と呼ばれる ViT の効率的な微調整方法を導入します。
計算コスト、メモリ負荷、トレーニング時間を削減しながら。
私たちのアプローチは、微調整中にすべてのレイヤーが同様に重要であるわけではなく、その重要性は現在のミニバッチに応じて異なるという観察に基づいています。
したがって、微調整の各ステップで、すべてのレイヤーの重要性を適応的に推定し、それに応じていわゆる「計算バジェット」を割り当てます。
より低い予算が割り当てられたレイヤーは、少ない数の入力トークンでトレーニングされるか、フリーズされたままになります。
レイヤーをフリーズすると、重みの更新が防止されるため、計算コストとメモリ使用量が削減されます。一方、トークンを破棄すると冗長なデータが削除され、処理が高速化され、メモリ要件が軽減されます。
この適応的なコンピューティング割り当てにより、レイヤー間でコンピューティングリソースを分散するためのほぼ最適なスケジュールが可能になり、比較した場合と比較して、トレーニング時間 (最大 1.5 倍)、FLOP (最大 2 倍)、およびメモリ負荷 (最大 2 倍) が大幅に削減されることを示します。
従来の完全な微調整アプローチに変わります。
さらに、LoRA などの他のパラメータ効率の高い微調整方法とうまく組み合わせることができます。

要約(オリジナル)

Recently, foundation models based on Vision Transformers (ViTs) have become widely available. However, their fine-tuning process is highly resource-intensive, and it hinders their adoption in several edge or low-energy applications. To this end, in this paper we introduce an efficient fine-tuning method for ViTs called $\textbf{ALaST}$ ($\textit{Adaptive Layer Selection Fine-Tuning for Vision Transformers}$) to speed up the fine-tuning process while reducing computational cost, memory load, and training time. Our approach is based on the observation that not all layers are equally critical during fine-tuning, and their importance varies depending on the current mini-batch. Therefore, at each fine-tuning step, we adaptively estimate the importance of all layers and we assign what we call “compute budgets” accordingly. Layers that were allocated lower budgets are either trained with a reduced number of input tokens or kept frozen. Freezing a layer reduces the computational cost and memory usage by preventing updates to its weights, while discarding tokens removes redundant data, speeding up processing and reducing memory requirements. We show that this adaptive compute allocation enables a nearly-optimal schedule for distributing computational resources across layers, resulting in substantial reductions in training time (up to 1.5x), FLOPs (up to 2x), and memory load (up to 2x) compared to traditional full fine-tuning approaches. Additionally, it can be successfully combined with other parameter-efficient fine-tuning methods, such as LoRA.

arxiv情報

著者	Alessio Devoto,Federico Alvetreti,Jary Pomponi,Paolo Di Lorenzo,Pasquale Minervini,Simone Scardapane
発行日	2024-08-16 11:27:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー