Large Language Models Are Human-Level Prompt Engineers

要約

大規模言語モデル (LLM) は、自然言語命令を条件付けすることにより、汎用コンピューターとして優れた機能を発揮してきました。
ただし、タスクのパフォーマンスは、モデルを操縦するために使用されるプロンプトの品質に大きく依存し、最も効果的なプロンプトは人間によって手作りされています。
古典的なプログラム合成とプロンプトエンジニアリングへの人間のアプローチに着想を得て、自動命令生成と選択のための自動プロンプトエンジニア (APE) を提案します。
私たちの方法では、選択されたスコア関数を最大化するために、LLM によって提案された命令候補のプールを検索することによって最適化された「プログラム」として命令を扱います。
選択した命令の品質を評価するために、選択した命令に続く別の LLM のゼロショットパフォーマンスを評価します。
24 の NLP タスクでの実験では、自動生成された命令が以前の LLM ベースラインよりも大幅に優れており、19/24 タスクで人間のアノテーターによって生成された命令よりも優れた、または同等のパフォーマンスを達成することが示されています。
APEのパフォーマンスを調査するために、広範な定性的および定量的分析を実施します。
APEで設計されたプロンプトを適用して、モデルを真実性および/または有益性に向けて操縦し、標準のコンテキスト内学習プロンプトに単純に追加するだけで、少数ショット学習のパフォーマンスを向上させることができることを示します.
https://sites.google.com/view/automatic-prompt-engineer で当社のウェブページをチェックしてください。

要約(オリジナル)

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. In our method, we treat the instruction as the ‘program,’ optimized by searching over a pool of instruction candidates proposed by an LLM in order to maximize a chosen score function. To evaluate the quality of the selected instruction, we evaluate the zero-shot performance of another LLM following the selected instruction. Experiments on 24 NLP tasks show that our automatically generated instructions outperform the prior LLM baseline by a large margin and achieve better or comparable performance to the instructions generated by human annotators on 19/24 tasks. We conduct extensive qualitative and quantitative analyses to explore the performance of APE. We show that APE-engineered prompts can be applied to steer models toward truthfulness and/or informativeness, as well as to improve few-shot learning performance by simply prepending them to standard in-context learning prompts. Please check out our webpage at https://sites.google.com/view/automatic-prompt-engineer.

arxiv情報

著者	Yongchao Zhou,Andrei Ioan Muresanu,Ziwen Han,Keiran Paster,Silviu Pitis,Harris Chan,Jimmy Ba
発行日	2023-03-10 17:20:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Language Models Are Human-Level Prompt Engineers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー