Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning

要約

凍結された事前トレーニング済みモデルのダウンストリームタスク条件付けのためのタスク固有のソフトプロンプトを学習するプロンプトチューニングアプローチは、そのパラメーター効率のために関心が高まっています。
大規模な言語モデルと十分なトレーニングデータを使用すると、プロンプトチューニングはフルモデルチューニングと同等のパフォーマンスを発揮します。
ただし、少数ショットの設定でトレーニングサンプルが限られている場合、プロンプトチューニングはフルモデルの微調整のパフォーマンスと一致しません。
この作業では、ソースタスクのソフトプロンプトから知識を転送することにより、プロンプトチューニングの数ショットパフォーマンスを向上させることに焦点を当てています。
低データ体制でのアンサンブル手法の優れた一般化機能を認識して、最初に実験を行い、さまざまなソースプロンプトに基づくモデル予測の単純なアンサンブルが、少数ショットでのソースプロンプト融合などの既存のマルチプロンプト知識伝達アプローチよりも優れていることを示します。
設定。
この観察に動機付けられて、モデルアンサンブルをさらに調査し、ソースモデルのサンプル固有のアンサンブル (SESoM) を提案します。
SESoM は、ソースモデルの出力をアンサンブルするときに、ターゲットサンプルごとに各ソースモデルの寄与を個別に調整することを学習します。
このようにして、SESoM はモデルアンサンブルアプローチの優れた一般化を継承し、同時に各ソースプロンプトのサンプル固有の能力を取得します。
さまざまなスケール (T5-{base, large, XL}) のモデルを使用して、8 つの NLP タスクの多様なセットにわたって実験を行い、SESoM が同じおよびより大きなパラメトリックスケールの既存のモデルよりも大幅に優れていることを確認しました。

要約(オリジナル)

Prompt tuning approaches, which learn task-specific soft prompts for a downstream task conditioning on frozen pre-trained models, have attracted growing interest due to its parameter efficiency. With large language models and sufficient training data, prompt tuning performs comparably to full-model tuning. However, with limited training samples in few-shot settings, prompt tuning fails to match the performance of full-model fine-tuning. In this work, we focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks. Recognizing the good generalization capabilities of ensemble methods in low-data regime, we first experiment and show that a simple ensemble of model predictions based on different source prompts, outperforms existing multi-prompt knowledge transfer approaches such as source prompt fusion in the few-shot setting. Motivated by this observation, we further investigate model ensembles and propose Sample-specific Ensemble of Source Models (SESoM). SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs. Through this way, SESoM inherits the superior generalization of model ensemble approaches and simultaneously captures the sample-specific competence of each source prompt. We conduct experiments across a diverse set of eight NLP tasks using models of different scales (T5-{base, large, XL}) and find that SESoM consistently outperforms the existing models of the same as well as larger parametric scale by a large margin.

arxiv情報

著者	Xiangyu Peng,Chen Xing,Prafulla Kumar Choubey,Chien-Sheng Wu,Caiming Xiong
発行日	2023-03-01 18:56:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー