Learning From Mistakes Makes LLM Better Reasoner

要約

大規模言語モデル (LLM) は最近、数学の問題を解決する際に顕著な推論能力を示しました。
この能力をさらに向上させるために、この研究では人間の学習プロセスに似た、間違いからの学習 (LeMa) を提案しています。
数学の問題を解くことができなかった人間の学生を考えてみましょう。彼は、自分が犯した間違いと、それを修正する方法から学びます。
LeMa は、このエラー主導型の学習プロセスを模倣して、GPT-4 によって生成された誤り訂正データペアに基づいて LLM を微調整します。
具体的には、まずさまざまな LLM から不正確な推論パスを収集し、次に GPT-4 を「修正者」として使用して、(1) 間違いステップを特定し、(2) 間違いの理由を説明し、(3) 間違いを修正して生成します。
最終的な答え。
実験結果は、LeMa の有効性を示しています。5 つのバックボーン LLM と 2 つの数学的推論タスクにわたって、CoT データのみの微調整と比較して、LeMa は一貫してパフォーマンスを向上させます。
印象的なことに、LeMa は WizardMath や MetaMath などの特殊な LLM にも恩恵をもたらし、GSM8K では 85.4%、MATH では 27.1% の pass@1 精度を達成しています。
これは、これらの困難なタスクにおいて非実行オープンソースモデルによって達成される SOTA パフォーマンスを上回ります。
私たちのコード、データ、モデルは https://github.com/microsoft/CodeT で公開されます。

要約(オリジナル)

Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a ‘corrector’ to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.

arxiv情報

著者	Shengnan An,Zexiong Ma,Zeqi Lin,Nanning Zheng,Jian-Guang Lou,Weizhu Chen
発行日	2023-10-31 17:52:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning From Mistakes Makes LLM Better Reasoner

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー