A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law

要約

このサーベイでは、カーネマンの「Thinking, Fast and Slow（思考は速く、そして遅く）」に記述されているように、人間の認知に着想を得た推論プロセスである「スローシンキング」を模倣するように設計された推論大規模言語モデル（LLM）の最近の進歩を探る。OpenAIのo1のようなこれらのモデルは、数学的推論、視覚的推論、医療診断、マルチエージェントディベートなどの複雑なタスク中に計算リソースを動的にスケーリングすることに焦点を当てています。我々は、推論LLMの開発を紹介し、その主要な技術を列挙する。100以上の研究を総合することで、人間のような深い思考とスケーラブルな推論効率を兼ね備えたLLMへの道筋を描く。(1)探索とサンプリング、動的検証により、タスクの複雑性に基づいて計算を動的に調整するテストタイムスケーリング、(2)ポリシーネットワーク、報酬モデル、自己進化戦略を活用した反復的改善により意思決定を洗練させる強化学習、(3)管理可能なステップで問題解決を構造化するスローシンキングフレームワーク（長いCoT、階層プロセスなど）。本調査は、この領域の課題とさらなる方向性を浮き彫りにしている。LLMの推論能力を理解し、発展させることは、科学的発見から意思決定支援システムまで、実世界のアプリケーションにおけるLLMの潜在能力を最大限に引き出すために極めて重要である。

要約(オリジナル)

This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic ‘slow thinking’ – a reasoning process inspired by human cognition, as described in Kahneman’s Thinking, Fast and Slow. These models, like OpenAI’s o1, focus on scaling computational resources dynamically during complex tasks, such as math reasoning, visual reasoning, medical diagnosis, and multi-agent debates. We present the development of reasoning LLMs and list their key technologies. By synthesizing over 100 studies, it charts a path toward LLMs that combine human-like deep thinking with scalable efficiency for reasoning. The review breaks down methods into three categories: (1) test-time scaling dynamically adjusts computation based on task complexity via search and sampling, dynamic verification; (2) reinforced learning refines decision-making through iterative improvement leveraging policy networks, reward models, and self-evolution strategies; and (3) slow-thinking frameworks (e.g., long CoT, hierarchical processes) that structure problem-solving with manageable steps. The survey highlights the challenges and further directions of this domain. Understanding and advancing the reasoning abilities of LLMs is crucial for unlocking their full potential in real-world applications, from scientific discovery to decision support systems.

arxiv情報

著者	Qianjun Pan,Wenkai Ji,Yuyang Ding,Junsong Li,Shilian Chen,Junyi Wang,Jie Zhou,Qin Chen,Min Zhang,Yulan Wu,Liang He
発行日	2025-05-05 14:14:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー