Learning From Mistakes Makes LLM Better Reasoner

要約

大規模言語モデル (LLM) は最近、数学の問題を解決する際に顕著な推論能力を示しました。
推論能力をさらに向上させるために、この研究では、LLM が人間の学習プロセスに似た MistAkes (LEMA) から学習できるかどうかを調査します。
数学の問題を解くことができなかった人間の学生を考えてみましょう。彼は、自分が犯した間違いと、それを修正する方法から学びます。
このエラー主導型の学習プロセスを模倣して、LEMA は LLM の微調整中に間違い修正データペアを組み込みます。
具体的には、まずさまざまな LLM から不正確な推論パスを収集し、次に GPT-4 を「修正者」として使用して、間違いのステップを特定し、間違いの理由を説明し、間違いを修正して最終的な答えを生成します。
さらに、修正データを生成するための質問セットを効果的に拡張する修正中心の進化戦略を適用します。
さまざまな LLM と推論タスクにわたる実験により、LEMA が CoT のみの微調整を効果的に改善することが示されています。
さらなるアブレーションにより、CoT データと補正データの間の有効性が不均一であることが明らかになりました。
これらの結果は、LLM が間違いから学ぶことで改善できる大きな可能性を示唆しています。
私たちのコード、モデル、プロンプトは https://github.com/microsoft/LEMA で公開されています。

要約(オリジナル)

Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve their reasoning capabilities, this work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LEMA incorporates mistake-correction data pairs during fine-tuning LLMs. Specifically, we first collect inaccurate reasoning paths from various LLMs, and then employ GPT-4 as a ”corrector” to identify the mistake step, explain the reason for the mistake, correct the mistake and generate the final answer. In addition, we apply a correction-centric evolution strategy that effectively expands the question set for generating correction data. Experiments across various LLMs and reasoning tasks show that LEMA effectively improves CoT-alone fine-tuning. Our further ablations shed light on the non-homogeneous effectiveness between CoT data and correction data. These results suggest a significant potential for LLMs to improve through learning from their mistakes. Our code, models and prompts are publicly available at https://github.com/microsoft/LEMA.

arxiv情報

著者	Shengnan An,Zexiong Ma,Zeqi Lin,Nanning Zheng,Jian-Guang Lou,Weizhu Chen
発行日	2024-03-29 07:17:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning From Mistakes Makes LLM Better Reasoner

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー