Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

要約

OpenAI-O1やDeepSeek-R1などの大規模な言語モデル（LLMS）の最近の進歩は、テスト時間スケーリングの有効性を実証し、拡張された推論プロセスがモデルのパフォーマンスを大幅に向上させます。
それにもかかわらず、現在のモデルは、長いテキストと強化学習（RL）トレーニング効率の処理における制限によって制約されています。
これらの問題に対処するために、シンプルでありながら効果的なテスト時間スケーリングアプローチマルチラウンド思考を提案します。
この方法は、以前の回答をその後のラウンドのプロンプトとして活用することにより、モデルの推論を繰り返し改善します。
QWQ-32BやDeepSeek-R1を含む複数のモデルにわたる広範な実験は、AIME 2024、Math-500、GPQA-Diamond、LiveCodebenchなどのさまざまなベンチマークのパフォーマンスの改善を一貫して示しています。
たとえば、QWQ-32Bの精度は、AIME 2024データセットで80.3％（ラウンド1）から82.1％（ラウンド2）に向上しましたが、DeepSeek-R1は79.7％から82.0％に同様の増加を示しました。
これらの結果は、マルチラウンド思考がモデルパフォーマンスの安定した強化を達成するための広く適用可能で簡単なアプローチであり、テスト時間スケーリング技術の将来の開発の可能性を強調していることを確認しています。
キープロンプト：{元の質問プロンプト}アシスタントの前回の回答は次のとおりです。

要約(オリジナル)

Recent advances in large language models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated the effectiveness of test-time scaling, where extended reasoning processes substantially enhance model performance. Despite this, current models are constrained by limitations in handling long texts and reinforcement learning (RL) training efficiency. To address these issues, we propose a simple yet effective test-time scaling approach Multi-round Thinking. This method iteratively refines model reasoning by leveraging previous answers as prompts for subsequent rounds. Extensive experiments across multiple models, including QwQ-32B and DeepSeek-R1, consistently show performance improvements on various benchmarks such as AIME 2024, MATH-500, GPQA-diamond, and LiveCodeBench. For instance, the accuracy of QwQ-32B improved from 80.3% (Round 1) to 82.1% (Round 2) on the AIME 2024 dataset, while DeepSeek-R1 showed a similar increase from 79.7% to 82.0%. These results confirm that Multi-round Thinking is a broadly applicable, straightforward approach to achieving stable enhancements in model performance, underscoring its potential for future developments in test-time scaling techniques. The key prompt: {Original question prompt} The assistant’s previous answer is: {last round answer} , and please re-answer.

arxiv情報

著者	Xiaoyu Tian,Sitong Zhao,Haotian Wang,Shuaiting Chen,Yunjie Ji,Yiping Peng,Han Zhao,Xiangang Li
発行日	2025-03-25 17:19:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー