Token-Budget-Aware LLM Reasoning

要約

大規模言語モデル (LLM) が幅広いタスクで優れた性能を発揮するには、推論が重要です。
思考連鎖 (CoT) 推論のような方法は、問題を中間ステップに分解することで LLM のパフォーマンスを向上させますが、トークンの使用に多大なオーバーヘッドが発生し、コストの増加につながります。
現在の LLM の推論プロセスは不必要に長く、プロンプトに適切なトークンバジェットを含めることで圧縮できることがわかりましたが、トークンバジェットの選択が実際の圧縮効果に重要な役割を果たします。
次に、トークンバジェットを認識した LLM 推論フレームワークを提案します。これは、推論の複雑さに基づいてさまざまな問題のトークンバジェットを動的に推定し、推定されたトークンバジェットを推論プロセスのガイドに使用します。
実験では、私たちの方法がわずかなパフォーマンスの低下でCoT推論のトークンコストを効果的に削減し、LLM推論の効率と精度のバランスを取るための実用的なソリューションを提供することを示しています。
コード: https://github.com/GeniusHTX/TALE。

要約(オリジナル)

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. We find that the reasoning process of current LLMs is unnecessarily lengthy and it can be compressed by including a reasonable token budget in the prompt, but the choice of token budget plays a crucial role in the actual compression effectiveness. We then propose a token-budget-aware LLM reasoning framework, which dynamically estimates token budgets for different problems based on reasoning complexity and uses the estimated token budgets to guide the reasoning process. Experiments show that our method effectively reduces token costs in CoT reasoning with only a slight performance reduction, offering a practical solution to balance efficiency and accuracy in LLM reasoning. Code: https://github.com/GeniusHTX/TALE.

arxiv情報

著者	Tingxu Han,Chunrong Fang,Shiyu Zhao,Shiqing Ma,Zhenyu Chen,Zhenting Wang
発行日	2024-12-24 16:55:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Token-Budget-Aware LLM Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー