Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners

要約

思考連鎖促進戦略により、さまざまな NLP タスクにわたる大規模言語モデル (LLM) のパフォーマンスが向上しました。
ただし、~\citet{cot_wei} に続く複雑な推論タスクを処理する場合には、理解エラー、計算エラー、プロセスエラー (ステップ抜けや幻覚など) などの欠点がまだあります。
その後、さまざまな種類のエラーを徹底的に分析した結果、複雑な推論タスクに対処するには、問題全体を深く理解することが重要であることがわかりました。
この論文では、人間が複雑な推論問題を解決する方法にヒントを得て、LLM による問題の包括的な理解を強化するように設計された、問題の深層理解 (DUP) プロンプティングと呼ばれる新しいプロンプト戦略を提案しました。
これは 3 つの段階で構成されます。1) 核心的な質問を抽出します。
2) 中心となる質問に基づいて問題解決情報を見つけます。
3) LLM によって回答を生成および抽出します。
10 個の多様な推論データセットに対する DUP プロンプトのパフォーマンスを評価します。
実験結果は、DUP プロンプトがすべてのデータセットにわたって Zero-Shot CoT ~\cite{kojima2022large} よりも大幅に優れていることを示唆しています。
特に、DUP は \textbf{SVAMP (90.4\% ～ 94.2\%) および GSM8K (94.6\% ～ 97.1\%) で最先端の性能を達成しています。}

要約(オリジナル)

Chain of Thought prompting strategy has enhanced the performance of Large Language Models (LLMs) across various NLP tasks. However, it still has shortcomings when dealing with complex reasoning tasks, following~\citet{cot_wei}, including understanding errors, calculation errors and process errors (e.g. missing-step and hallucinations). Subsequently, Our in-depth analysis of various error types has found that deeply understanding the whole problem is critical in addressing complicated reasoning tasks. In this paper, we proposed a novel prompt strategy called Deeply Understanding the Problems (DUP) prompting, inspired by how humans solve complex reasoning problems, designed to enhance the comprehensive understanding of problems by LLMs. It consists of three stages: 1) extract the core question; 2) find out problem-solving information based on the core question; 3) generate and extract answers by LLMs. We evaluate the performance of DUP prompting on ten diverse reasoning datasets. Experimental results suggest that DUP prompting significantly outperforms Zero-Shot CoT ~\cite{kojima2022large} across all datasets. Notably, DUP achieves \textbf{state-of-the-art on SVAMP (90.4\% to 94.2\%) and GSM8K (94.6\% to 97.1\%).}

arxiv情報

著者	Qihuang Zhong,Kang Wang,Ziyang Xu,Juhua Liu,Liang Ding,Bo Du,Dacheng Tao
発行日	2024-04-23 12:16:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー