Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting

要約

最近の研究では、テキスト推論タスク、つまり思考連鎖パラダイムで強力なパフォーマンスを得るために、大規模な言語モデルに説明を促す方法が示されています。
ただし、説明が微妙に異なると、下流タスクの精度が大きく変わる可能性があります。
専門家ではない人によって書かれた既製の説明など、タスクに「調整」されていない説明は、平凡なパフォーマンスにつながる可能性があります。
この論文では、説明が組み込まれたプロンプトをブラックボックス方式で最適化する方法の問題に取り組みます。
まず、リーブワンアウトスキームを使用してプロンプト内の各例に対する説明候補のセットを生成し、次に 2 段階のフレームワークを使用してこれらの説明の効果的な組み合わせを見つけます。
まず、新しい例の対数尤度および精度という 2 つの代理指標に従って、コンテキスト内の各例の説明を個別に評価します。
次に、説明の組み合わせを検索して、シルバーラベルの開発セットに対して高いパフォーマンスを生み出すものを見つけます。
質問応答、数学的推論、自然言語推論にわたる 4 つのテキスト推論タスクにわたる結果は、プロキシメトリクスがグラウンドトゥルースの精度と相関しており、全体的な手法がクラウドワーカーの注釈や単純な検索戦略よりも効果的にプロンプトを改善できることを示しています。

要約(オリジナル)

Recent work has shown how to prompt large language models with explanations to obtain strong performance on textual reasoning tasks, i.e., the chain-of-thought paradigm. However, subtly different explanations can yield widely varying downstream task accuracy. Explanations that have not been ‘tuned’ for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion. We first generate sets of candidate explanations for each example in the prompt using a leave-one-out scheme, then find an effective combination of these explanations with a two-stage framework. We first evaluate explanations for each in-context example in isolation according to two proxy metrics, log likelihood and accuracy on new examples. Then, we search over combinations of explanations to find one that yields high performance against a silver-labeled development set. Across four textual reasoning tasks spanning question answering, mathematical reasoning, and natural language inference, results show that our proxy metrics correlate with ground truth accuracy and our overall method can effectively improve prompts over crowdworker annotations and naive search strategies

arxiv情報

著者	Xi Ye,Greg Durrett
発行日	2023-10-18 14:42:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー