Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers

要約

大規模な事前トレーニング済みモデル、特に視覚言語データからトレーニングされたモデルは、大規模なトレーニングデータセットとモデルの両方から得られる多大な価値を実証しています。
したがって、これらの開発から利益を得るために、転移学習と、大規模な一般的な事前トレーニングから特定のダウンストリームタスクへのモデルの適応に新たな関心が寄せられています。
しかし、モデルの規模が拡大し続けているということは、微調整の従来のアプローチでさえ、大規模な機関以外では実行不可能になりつつあることを意味します。
プロンプト学習は、凍結されたままのモデルに追加の入力を学習するだけでモデルを適応させる柔軟な方法として登場しましたが、これまでのところ、パフォーマンスは微調整よりも劣っていました。
これに対処するために、学習したトークンのライブラリからサンプリングすることにより、入力依存のプロンプトを生成するプロンプト生成ネットワーク (PGN) を提案します。
PGN は、事前トレーニング済みのモデルをさまざまな新しいデータセットに適応させるのに効果的であることを示しています。
以前の迅速な学習方法を大幅に上回り、必要なパラメーターは 100 分の 1 でありながら、12 個のデータセットのうち 5 個を完全に微調整することさえできます。
PGN は、複数のデータセットを同時にトレーニングおよび推論するために使用することもでき、ドメイン間でトークンを割り当てることを学習します。
これらの調査結果を考えると、PGN は凍結モデルのダウンストリーム適応のための実行可能でスケーラブルなアプローチであると結論付けています。
コードは https://github.com/jochemloedeman/PGN で入手できます。

要約(オリジナル)

Large-scale pretrained models, especially those trained from vision-language data have demonstrated the tremendous value that can be gained from both larger training datasets and models. Thus, in order to benefit from these developments, there is renewed interest in transfer learning and adapting models from large-scale general pretraining to particular downstream tasks. However, the continuously increasing size of the models means that even the classic approach of finetuning is becoming infeasible for all but big institutions. Prompt leaning has emerged as a flexible way to adapt models by solely learning additional inputs to a model that is kept frozen, but so far performances remained inferior to finetuning. To address this, we propose the Prompt Generation Network (PGN) that generates input-dependent prompts by sampling from a learned library of tokens. We show the PGN is effective in adapting pretrained models to various new datasets. It surpasses previous prompt-learning methods by a large margin and even fullfinetuning on 5 out of 12 datasets while requiring 100x less parameters. PGN can even be used for training and inferring on multiple datasets simultaneously and learns to allocate tokens between domains. Given these findings, we conclude that PGN is a viable and scalable approach for downstream adaptation of frozen models. Code is available at https://github.com/jochemloedeman/PGN.

arxiv情報

著者	Jochem Loedeman,Maarten C. Stol,Tengda Han,Yuki M. Asano
発行日	2022-10-12 17:59:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー