Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

要約

特にマルチステップ論理控除を必要とする複雑なタスクの場合、大規模な言語モデル（LLM）の推論能力を強化することは、依然として重要な課題です。
従来の推論時間スケーリング方法は、プロセス報酬モデルからのスカラー報酬信号を利用して候補の推論ステップを評価しますが、これらのスカラーの報酬は、各ステップを理解し正当化するために不可欠な微妙な定性的情報を欠いています。
この論文では、段階的な推論時間スケーリングアプローチ – 段階的な自然言語の自己批判（パネル）を提案します。
各候補者の推論ステップに対して豊富で人間の読み取り可能な批評を生成することにより、パネルは重要な定性的情報を保持し、推論中のより情報に基づいた意思決定を促進します。
このアプローチは、タスク固有の検証剤と関連するトレーニングのオーバーヘッドの必要性を回避し、多様なタスクに広く適用されます。
AIMEやGPQAを含む挑戦的な推論ベンチマークに関する実験結果は、パネルが推論パフォーマンスを大幅に向上させ、従来のスカラー報酬ベースの方法を上回ることを示しています。
私たちのコードは、この有望な分野での将来の研究をサポートおよび奨励するために、https：//github.com/puddingyeah/panelで入手できます。

要約(オリジナル)

Enhancing the reasoning capabilities of large language models (LLMs), particularly for complex tasks requiring multi-step logical deductions, remains a significant challenge. Traditional inference time scaling methods utilize scalar reward signals from process reward models to evaluate candidate reasoning steps, but these scalar rewards lack the nuanced qualitative information essential for understanding and justifying each step. In this paper, we propose a novel inference-time scaling approach — stepwise natural language self-critique (PANEL), which employs self-generated natural language critiques as feedback to guide the step-level search process. By generating rich, human-readable critiques for each candidate reasoning step, PANEL retains essential qualitative information, facilitating better-informed decision-making during inference. This approach bypasses the need for task-specific verifiers and the associated training overhead, making it broadly applicable across diverse tasks. Experimental results on challenging reasoning benchmarks, including AIME and GPQA, demonstrate that PANEL significantly enhances reasoning performance, outperforming traditional scalar reward-based methods. Our code is available at https://github.com/puddingyeah/PANEL to support and encourage future research in this promising field.

arxiv情報

著者	Yansi Li,Jiahao Xu,Tian Liang,Xingyu Chen,Zhiwei He,Qiuzhi Liu,Rui Wang,Zhuosheng Zhang,Zhaopeng Tu,Haitao Mi,Dong Yu
発行日	2025-03-21 17:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー