Time’s Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint

要約

最近の研究により、テスト時間スケーリングにおける大規模な言語モデル（LLM）の顕著な可能性が実証されています。
回答する前にモデルに考えることにより、追加の推論計算ではるかに高い精度を達成することができます。
ただし、多くの現実世界のシナリオでは、モデルは時間の制約の下で使用され、特定の出力長以内にユーザーに回答を提供する必要があります。
LLMの推論能力がそのような制約の下で効果的であるかどうか、どのようにしても不明です。
詳細な経験的研究を実施することにより、この問題を最初に見ていきます。
具体的には、広範囲の出力長予算の下で一般的な推論データセットで25 LLMをテストし、推論の精度とモデルタイプ、モデルサイズ、プロンプトスタイルなどを含むさまざまなプロパティとの相関関係を分析します。
結果は、制約のない状況とは異なる予算認識LLMの推論に関するいくつかの興味深い調査結果を示しています。
モデルサイズとプロンプトの最適な選択は、さまざまな予算の下で変更されます。
これらの調査結果は、ユーザーが実際のレイテンシの制約の下でLLMを展開するための実用的なガイダンスを提供します。

要約(オリジナル)

Recent work has demonstrated the remarkable potential of Large Language Models (LLMs) in test-time scaling. By making the models think before answering, they are able to achieve much higher accuracy with extra inference computation. However, in many real-world scenarios, models are used under time constraints, where an answer should be given to the user within a certain output length. It is unclear whether and how the reasoning abilities of LLMs remain effective under such constraints. We take a first look at this problem by conducting an in-depth empirical study. Specifically, we test more than 25 LLMs on common reasoning datasets under a wide range of output length budgets, and we analyze the correlation between the inference accuracy and various properties including model type, model size, prompt style, etc. We also consider the mappings between the token budgets and the actual on-device latency budgets. The results have demonstrated several interesting findings regarding the budget-aware LLM reasoning that differ from the unconstrained situation, e.g. the optimal choices of model sizes and prompts change under different budgets. These findings offer practical guidance for users to deploy LLMs under real-world latency constraints.

arxiv情報

著者	Yi Sun,Han Wang,Jiaqiang Li,Jiacheng Liu,Xiangyu Li,Hao Wen,Huiwen Zheng,Yan Liang,Yuanchun Li,Yunxin Liu
発行日	2025-04-22 13:31:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Time’s Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー