Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning

要約

ゆっくりと呼ばれることが多いテスト時間スケーリングは、大規模な言語モデル（LLMS）のマルチステップ推論を強化することが実証されています。
しかし、その広範な利用にもかかわらず、遅い考え方の根底にあるメカニズムは、よく理解されていないままです。
このペーパーでは、理論的な観点からの外部の遅い考えのメカニズムを探ります。
まず、LLM推論プロセス内のスノーボールエラー効果を調べ、情報理論を使用して正しい推論の可能性に接続します。
これに基づいて、外部のゆっくりと考えている方法は、エラー確率を軽減する戦略として解釈できることを示します。
さらに、単純なものから複雑なものまで、それらの違いと相互関係を強調する、一般的な外部の遅いアプローチの比較分析を提供します。
我々の調査結果は、これらの方法の有効性が主に採用されている特定のフレームワークによって決定されておらず、検索範囲またはモデルの内部推論能力を拡大することで、長期的により持続的な改善が得られる可能性があることを示唆しています。
https://github.com/zygan1999/snowball-errors and-frobabilityでコードをオープンソースします。

要約(オリジナル)

Test-time scaling, which is also often referred to as slow-thinking, has been demonstrated to enhance multi-step reasoning in large language models (LLMs). However, despite its widespread utilization, the mechanisms underlying slow-thinking methods remain poorly understood. This paper explores the mechanisms of external slow-thinking from a theoretical standpoint. We begin by examining the snowball error effect within the LLM reasoning process and connect it to the likelihood of correct reasoning using information theory. Building on this, we show that external slow-thinking methods can be interpreted as strategies to mitigate the error probability. We further provide a comparative analysis of popular external slow-thinking approaches, ranging from simple to complex, highlighting their differences and interrelationships. Our findings suggest that the efficacy of these methods is not primarily determined by the specific framework employed, and that expanding the search scope or the model’s internal reasoning capacity may yield more sustained improvements in the long term. We open-source our code at https://github.com/ZyGan1999/Snowball-Errors-and-Probability.

arxiv情報

著者	Zeyu Gan,Yun Liao,Yong Liu
発行日	2025-01-28 14:14:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー