ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization

要約

チェーンオブシャーチ（COT）プロンプトの最近の進歩により、大規模な言語モデル（LLM）の推論能力が大幅に改善されました。
しかし、これらの方法はしばしば考え過ぎに悩まされ、不必要に長いまたは冗長な推論の痕跡につながります。
既存のアプローチは、LLMをトレーニングするための複数の推論チェーンをキュレーションすることでこの問題を軽減しようとしますが、それらの有効性は、生成されたデータの品質によって制約され、過剰フィッティングを起こしやすいことがよくあります。
課題に対処するために、推論軌道の精度と長さのバランスをとることを目的とした新しい方法である段階的試験（RecUT）を通じて推定を推論することを提案します。
具体的には、RecUTは段階的な探索メカニズムと長期にわたる切り替えサンプリング戦略を採用しており、LLMが多様な推論パスを徐々に生成できるようにします。
これらのパスは評価され、2つの特殊なモデル（Gemini LLMS）をトレーニングするための優先ペアを構築するために使用されます。
これら2つのモデルのパラメーターを補間することにより、最終的な統合モデルが取得されます。
複数の数学の推論データセットとバックボーンモデルにわたる実験結果は、Recutがさまざまなベースラインと比較して推論の精度を維持または改善しながら、推論の長さを約30〜50％削減することを示しています。
すべてのコードとデータは、https：//github.com/neuir/recutからリリースされます。

要約(オリジナル)

Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of Large Language Models (LLMs). However, these methods often suffer from overthinking, leading to unnecessarily lengthy or redundant reasoning traces. Existing approaches attempt to mitigate this issue through curating multiple reasoning chains for training LLMs, but their effectiveness is often constrained by the quality of the generated data and prone to overfitting. To address the challenge, we propose Reasoning Compression ThroUgh Stepwise Trials (ReCUT), a novel method aimed at balancing the accuracy and length of reasoning trajectory. Specifically, ReCUT employs a stepwise exploration mechanism and a long-short switched sampling strategy, enabling LLMs to incrementally generate diverse reasoning paths. These paths are evaluated and used to construct preference pairs to train two specialized models (Gemini LLMs)-one optimized for reasoning accuracy, the other for shorter reasoning. A final integrated model is obtained by interpolating the parameters of these two models. Experimental results across multiple math reasoning datasets and backbone models demonstrate that ReCUT significantly reduces reasoning lengths by approximately 30-50%, while maintaining or improving reasoning accuracy compared to various baselines. All codes and data will be released via https://github.com/NEUIR/ReCUT.

arxiv情報

著者	Zhensheng Jin,Xinze Li,Yifan Ji,Chunyi Peng,Zhenghao Liu,Qi Shi,Yukun Yan,Shuo Wang,Furong Peng,Ge Yu
発行日	2025-06-12 15:43:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー