Steering LLM Thinking with Budget Guidance

要約

最近の深いことの大規模な言語モデルは、パフォーマンスを改善するために広範囲にわたって広範囲にわたって推測することが多いですが、不均衡なパフォーマンスの向上を伴う過剰な推論コストが発生するため、このような長い推論は必ずしも望ましいとは限りません。
したがって、パフォーマンスを犠牲にすることなく推論の長さを制御することは重要ですが、特に緊密な思考予算の下で挑戦的です。
LLMの微調整を必要とせずに、LLMの推論プロセスを目標予算に向けるためのシンプルで効果的な方法である予算ガイダンスを提案します。
私たちのアプローチは、次のトークンの世代の間に残りの思考長にわたってガンマ分布をモデル化する軽量予測因子を導入します。
次に、この信号を使用して、生成を柔らかくトークンレベルでガイドし、全体的な推論トレースが指定された思考予算に準拠するようにします。
予算ガイダンスにより、思考長の自然な制御が可能になり、数学ベンチマークに挑戦するベースライン方法よりも大幅なトークン効率の改善が可能になります。
たとえば、ベースラインの方法と比較して、MATH-500ベンチマークでは最大26％の精度の増加を達成しますが、完全な思考モデルで使用される思考トークンの63％のみで競争の精度を維持します。
また、予算ガイダンスは、より広範なタスクドメインに一般的になり、質問の難易度を推定するなど、緊急の機能を示します。
ソースコードは、https：//github.com/umass-embodied-agi/budgetguidanceで入手できます。

要約(オリジナル)

Recent deep-thinking large language models often reason extensively to improve performance, but such lengthy reasoning is not always desirable, as it incurs excessive inference costs with disproportionate performance gains. Controlling reasoning length without sacrificing performance is therefore important, but remains challenging, especially under tight thinking budgets. We propose budget guidance, a simple yet effective method for steering the reasoning process of LLMs toward a target budget without requiring any LLM fine-tuning. Our approach introduces a lightweight predictor that models a Gamma distribution over the remaining thinking length during next-token generation. This signal is then used to guide generation in a soft, token-level manner, ensuring that the overall reasoning trace adheres to the specified thinking budget. Budget guidance enables natural control of the thinking length, along with significant token efficiency improvements over baseline methods on challenging math benchmarks. For instance, it achieves up to a 26% accuracy gain on the MATH-500 benchmark under tight budgets compared to baseline methods, while maintaining competitive accuracy with only 63% of the thinking tokens used by the full-thinking model. Budget guidance also generalizes to broader task domains and exhibits emergent capabilities, such as estimating question difficulty. The source code is available at: https://github.com/UMass-Embodied-AGI/BudgetGuidance.

arxiv情報

著者	Junyan Li,Wenshuo Zhao,Yang Zhang,Chuang Gan
発行日	2025-06-16 17:57:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Steering LLM Thinking with Budget Guidance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー