No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function

要約

大規模言語モデル (LLM) は、優れた言語理解と文脈学習能力を実証し、自然言語処理 (NLP) タスクや複雑な数学的推論に適しています。
ただし、LLM を数学的推論タスクに適用すると、解の確率が高いにもかかわらず、正しい推論ステップと答えを生成するのに苦労することがよくあります。
この制限を克服し、追加の微調整ステップを行わずに微調整された LLM の数学的推論能力を強化するために、モンテカルロ木探索 (MCTS) と軽量のエネルギー関数を組み込んで決定ステップをランク付けし、即座の反応と正確な処理を可能にする方法を提案します。
推論。
具体的には、微調整された LLM を残差ベースのエネルギーモデル (Residual-EBM) に再定式化し、ノイズ対比推定を使用してエネルギー関数のパラメーターを推定します。
次に、パス検証器としてエネルギー関数を備えた MCTS を利用して、出力空間を検索し、推論パスを評価します。
GSM8k と AQUA-RAT という 2 つの数学的推論ベンチマークに関する広範な実験を通じて、人間のフィードバックによる追加の微調整や強化学習を必要とせずに、微調整されたモデルの pass@1 メトリクスを大幅に向上させる、この手法の優れた機能を実証しました。
アライメント。

要約(オリジナル)

Large language models (LLMs) demonstrate impressive language understanding and contextual learning abilities, making them suitable for natural language processing (NLP) tasks and complex mathematical reasoning. However, when applied to mathematical reasoning tasks, LLMs often struggle to generate correct reasoning steps and answers despite having high probabilities for the solutions. To overcome this limitation and enhance the mathematical reasoning capabilities of fine-tuned LLMs without additional fine-tuning steps, we propose a method that incorporates Monte Carlo Tree Search (MCTS) and a lightweight energy function to rank decision steps and enable immediate reaction and precise reasoning. Specifically, we re-formulate the fine-tuned LLMs into a Residual-based Energy Model (Residual-EBM) and employ noise contrastive estimation to estimate the energy function’s parameters. We then utilize MCTS with the energy function as a path verifier to search the output space and evaluate the reasoning path. Through extensive experiments on two mathematical reasoning benchmarks, GSM8k and AQUA-RAT, we demonstrate the exceptional capabilities of our method, which significantly improves the pass@1 metric of the fine-tuned model without requiring additional fine-tuning or reinforcement learning with human feedback alignment.

arxiv情報

著者	Haotian Xu
発行日	2023-09-12 03:03:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー