What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

要約

「テスト時間コンピューティング」とも呼ばれるテスト時間スケーリング（TTS）が徐々に減少したため、事前トレーニング時代の計算（データとパラメーター）のスケーリング（データとパラメーター）への熱意が徐々に減少しました。
最近の研究は、TTSが大規模な言語モデル（LLM）の問題解決能力をさらに引き出すことができることを示しており、数学やコーディングなどの専門的な推論タスクだけでなく、オープンエンドQ＆Aなどの一般的なタスクでも重要なブレークスルーを可能にします。
しかし、この分野での最近の努力の爆発にもかかわらず、体系的な理解を提供する包括的な調査が緊急に必要である。
このギャップを埋めるために、TTS研究の4つのコアディメンションに沿って構成された統一された多次元フレームワークを提案します。
この分類法に基づいて、メソッド、アプリケーションシナリオ、評価の側面の広範なレビューを実施し、より広範なTTSランドスケープ内の個々の手法のユニークな機能的役割を強調する組織化された分解を提示します。
この分析から、TTSの主要な発達軌跡をこれまでに蒸留し、実用的な展開のための実践的なガイドラインを提供します。
さらに、いくつかのオープンな課題を特定し、さらなるスケーリング、技術の機能的な本質の明確化、より多くのタスクへの一般化、より多くの帰属など、有望な将来の方向性に関する洞察を提供します。

要約(オリジナル)

As enthusiasm for scaling computation (data and parameters) in the pretraining era gradually diminished, test-time scaling (TTS), also referred to as “test-time computing” has emerged as a prominent research focus. Recent studies demonstrate that TTS can further elicit the problem-solving capabilities of large language models (LLMs), enabling significant breakthroughs not only in specialized reasoning tasks, such as mathematics and coding, but also in general tasks like open-ended Q&A. However, despite the explosion of recent efforts in this area, there remains an urgent need for a comprehensive survey offering a systemic understanding. To fill this gap, we propose a unified, multidimensional framework structured along four core dimensions of TTS research: what to scale, how to scale, where to scale, and how well to scale. Building upon this taxonomy, we conduct an extensive review of methods, application scenarios, and assessment aspects, and present an organized decomposition that highlights the unique functional roles of individual techniques within the broader TTS landscape. From this analysis, we distill the major developmental trajectories of TTS to date and offer hands-on guidelines for practical deployment. Furthermore, we identify several open challenges and offer insights into promising future directions, including further scaling, clarifying the functional essence of techniques, generalizing to more tasks, and more attributions.

arxiv情報

著者	Qiyuan Zhang,Fuyuan Lyu,Zexu Sun,Lei Wang,Weixu Zhang,Zhihan Guo,Yufei Wang,Irwin King,Xue Liu,Chen Ma
発行日	2025-03-31 15:46:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー