SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

要約

コーディングを通じて数学的問題を解決するために大規模言語モデル (LLM) を教える傾向が高まっています。
既存の研究は主に、強力なクローズドソースモデルにシードトレーニングデータを生成させ、その後ドメイン内データを拡張して、LLM にコード支援の数学的推論のためのかなりの機能を提供することに焦点を当てています。
ただし、GSM8K などの少数のデータセットから派生した拡張データでこれらのモデルを継続的にトレーニングすると、汎化能力が損なわれ、その有効性が狭い範囲の質問タイプに制限される可能性があります。
逆に、大規模で専門家が作成した多様な数学の質問と回答のペアを活用することで、そのような LLM を改善できる可能性はまだ解明されていません。
これらのリソースを活用し、コード応答評価などの固有の課題に取り組むために、コードベースの批評家モデルを使用して、質問コードのデータ構築、品質管理、補完的評価などのステップをガイドする新しいパラダイムを提案します。
また、継続的な改善を促進するために、自己生成された指導/好みデータを使用したさまざまな調整アルゴリズムも検討します。
英語と中国語のドメイン内 (最大 +5.7%) とドメイン外 (+4.4%) の両方のベンチマークにわたる実験により、提案されたパラダイムの有効性が実証されました。

要約(オリジナル)

There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding. Existing studies primarily focus on prompting powerful, closed-source models to generate seed training data followed by in-domain data augmentation, equipping LLMs with considerable capabilities for code-aided mathematical reasoning. However, continually training these models on augmented data derived from a few datasets such as GSM8K may impair their generalization abilities and restrict their effectiveness to a narrow range of question types. Conversely, the potential of improving such LLMs by leveraging large-scale, expert-written, diverse math question-answer pairs remains unexplored. To utilize these resources and tackle unique challenges such as code response assessment, we propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. We also explore different alignment algorithms with self-generated instruction/preference data to foster continuous improvement. Experiments across both in-domain (up to +5.7%) and out-of-domain (+4.4%) benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.

arxiv情報

著者	Dian Yu,Baolin Peng,Ye Tian,Linfeng Song,Haitao Mi,Dong Yu
発行日	2024-08-28 06:33:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー