Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks

要約

LLM は多くの分野を変革する可能性を秘めていますが、推論タスクでは依然として人間よりもパフォーマンスが劣ります。
既存の手法では、モデルに段階的な計算を生成させますが、この研究では、「LLM に質問を分析させるとパフォーマンスが向上するか?」という質問を調査します。
我々は、質問分析プロンプティング (QAP) と呼ばれる新しいプロンプト戦略を提案します。この戦略では、モデルは、解く前に $n$ 単語で質問を説明するように求められます。
$n$ の値は、モデルによって生成される応答の長さに影響します。
QAP は、算術データセット GSM8K、AQuA、SAT と常識データセット StrategyQA を使用して GPT 3.5 Turbo および GPT 4 Turbo で評価されます。
QAP は、Chain-of-Thought (CoT)、Plan and Solve Prompting (PS+)、Take A Deep Breath (TADB) などの他の最先端のプロンプトと比較されます。
QAP は、GPT3.5 と GPT4 の両方で、AQuA および SAT データセット上のすべての最先端のプロンプトよりも優れたパフォーマンスを発揮します。
QAP は、テストの 75\% で常に上位 2 位のプロンプトにランクされています。
QAP パフォーマンスの重要な要素は応答の長さに起因する可能性があり、難しい質問に答える場合には詳細な応答が有益ですが、簡単な質問には悪影響を与える可能性があります。

要約(オリジナル)

Although LLMs have the potential to transform many fields, they still underperform humans in reasoning tasks. Existing methods induce the model to produce step-by-step calculations, but this research explores the question: Does making the LLM analyze the question improve its performance? We propose a novel prompting strategy called Question Analysis Prompting (QAP), in which the model is prompted to explain the question in $n$ words before solving. The value of $n$ influences the length of response generated by the model. QAP is evaluated on GPT 3.5 Turbo and GPT 4 Turbo on arithmetic datasets GSM8K, AQuA, and SAT and commonsense dataset StrategyQA. QAP is compared with other state-of-the-art prompts including Chain-of-Thought (CoT), Plan and Solve Prompting (PS+) and Take A Deep Breath (TADB). QAP outperforms all state-of-the-art prompts on AQuA and SAT datasets on both GPT3.5 and GPT4. QAP consistently ranks among the top-2 prompts on 75\% of the tests. A key factor of QAP performance can be attributed to response length, where detailed responses are beneficial when answering harder questions, but can negatively affect easy questions.

arxiv情報

著者	Dharunish Yugeswardeenoo,Kevin Zhu,Sean O’Brien
発行日	2024-08-26 08:09:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー