Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Optimization for Few-shot Learning

要約

プロンプトベースの事前トレーニング済み言語モデル (PLM) パラダイムは、数ショットの自然言語処理 (NLP) タスクで大幅に成功しました。
しかし、従来の個別プロンプト最適化手法では、基本プロンプトセットを設計し、高品質のプロンプトを特定するには専門知識が必要であり、コストがかかり、非効率で主観的です。
一方、既存の継続的プロンプト最適化手法は、PLM の勾配情報を通じて理想的なプロンプトを学習することでパフォーマンスを向上させますが、計算コストが高く、可読性と汎用性が低いことがしばしば懸念されます。
研究のギャップに対処するために、我々は対話で構成されたポリシー勾配ベースの離散プロンプト最適化 ($DP_2O$) 手法を提案します。
まず、GPT-4 に基づいて可読性プロンプトセットを生成するためのマルチラウンドダイアログアラインメント戦略を設計します。
さらに、線形複雑さを持つ高品質のプロンプトを識別するための効率的なプロンプトスクリーニング指標を提案します。
最後に、プロンプトを入力に最適に一致させるためのポリシー勾配に基づいて強化学習 (RL) フレームワークを構築します。
少数ショット設定のタスクでわずか 0.67% の PLM パラメータサイズでポリシーネットワークをトレーニングすることにより、$DP_2O$ は、4 つのオープンタスクで平均精度が最先端 (SOTA) 手法よりも 1.52% 優れています。
-ソースデータセット。
さらに、その後の実験により、$DP_2O$ が優れた普遍性、堅牢性、一般化能力を備えていることも実証されました。

要約(オリジナル)

Prompt-based pre-trained language models (PLMs) paradigm have succeeded substantially in few-shot natural language processing (NLP) tasks. However, prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts, which is costly, inefficient, and subjective. Meanwhile, existing continuous prompt optimization methods improve the performance by learning the ideal prompts through the gradient information of PLMs, whose high computational cost, and low readability and generalizability are often concerning. To address the research gap, we propose a Dialogue-comprised Policy-gradient-based Discrete Prompt Optimization ($DP_2O$) method. We first design a multi-round dialogue alignment strategy for readability prompt set generation based on GPT-4. Furthermore, we propose an efficient prompt screening metric to identify high-quality prompts with linear complexity. Finally, we construct a reinforcement learning (RL) framework based on policy gradients to match the prompts to inputs optimally. By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DP_2O$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on average on four open-source datasets. Moreover, subsequent experiments also demonstrate that $DP_2O$ has good universality, robustness, and generalization ability.

arxiv情報

著者	Chengzhengxu Li,Xiaoming Liu,Yichen Wang,Duyi Li,Yu Lan,Chao Shen
発行日	2023-08-14 16:58:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Optimization for Few-shot Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー