Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models

要約

パラメーターと計算量が増え続けるにつれて、ビジョン言語事前トレーニング (VLP) モデルは、下流のタスク適応に法外な支出を示します。
最近の取り組みは主に、少数のパラメータのみを更新することによる VLP モデルのパラメータ効率的転移学習 (PETL) に焦点を当てています。
ただし、過剰な計算オーバーヘッドが依然として VLP の適用を妨げています。
この論文では、VLP モデルのパラメーターおよび計算効率の高い転移学習 (PCETL) を目的としています。
特に、PCETL は、VLP モデルのトレーニング可能なパラメーターの数を制限するだけでなく、推論中の計算の冗長性を削減して、より効率的な転送を可能にする必要もあります。
この目標に近づくために、効果的な PCETL に向けた新しい動的アーキテクチャスキッピング (DAS) アプローチを提案します。
DAS は、VLP モデルの固有アーキテクチャを直接最適化するのではなく、まず強化学習 (RL) ベースのプロセスを通じて下流タスクに対するモジュールの重要性を観察し、次に、得られた結果に応じて、軽量ネットワーク、つまりアダプターで冗長なものをスキップします。
報酬。
この場合、VLP モデルは、下流タスクでの推論を高速化しながら、トレーニング可能なパラメーターのスケールを適切に維持できます。
DAS を検証するために、DAS を 2 つの代表的な VLP モデル、つまり ViLT と METER に適用し、多数の VL タスクに対して広範な実験を実施します。
実験結果は、計算の複雑さを軽減するという DAS の大きな利点を示しているだけではありません。
VQA2.0 での METER の FLOP は -11.97% ですが、パラメータのスケールとパフォーマンスの点で既存の PETL 手法に対する競争力も確認されています。
ソースコードは付録に記載されています。

要約(オリジナル)

With ever increasing parameters and computation, vision-language pre-trained (VLP) models exhibit prohibitive expenditure in downstream task adaption. Recent endeavors mainly focus on parameter efficient transfer learning (PETL) for VLP models by only updating a small number of parameters. However, excessive computational overhead still plagues the application of VLPs. In this paper, we aim at parameter and computation efficient transfer learning (PCETL) for VLP models. In particular, PCETL not only needs to limit the number of trainable parameters in VLP models, but also to reduce the computational redundancy during inference, thus enabling a more efficient transfer. To approach this target, we propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL. Instead of directly optimizing the intrinsic architectures of VLP models, DAS first observes the significances of their modules to downstream tasks via a reinforcement learning (RL) based process, and then skips the redundant ones with lightweight networks, i.e., adapters, according to the obtained rewards. In this case, the VLP model can well maintain the scale of trainable parameters while speeding up its inference on downstream tasks. To validate DAS, we apply it to two representative VLP models, namely ViLT and METER, and conduct extensive experiments on a bunch of VL tasks. The experimental results not only show the great advantages of DAS in reducing computational complexity, e.g. -11.97% FLOPs of METER on VQA2.0, but also confirm its competitiveness against existing PETL methods in terms of parameter scale and performance. Our source code is given in our appendix.

arxiv情報

著者	Qiong Wu,Wei Yu,Yiyi Zhou,Shubin Huang,Xiaoshuai Sun,Rongrong Ji
発行日	2023-09-06 10:23:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー