An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

要約

大規模言語モデル(LLM)の最適な学習構成は、モデルサイズと計算バジェットに関して広く研究されている。しかし、推論時にLLMをどのように最適に構成するかについては、十分に深く研究されていない。我々は、計算最適推論を研究する。すなわち、性能向上のために推論時間の追加計算を最適にトレードオフするようなモデルと推論戦略を設計する。計算最適な推論手法を理解し設計するための第一歩として、我々は、モデルサイズと計算予算が異なる2つの異なる木探索アルゴリズムを用いて、グリード探索、多数決、Best-of-N、重み付き投票、およびそれらの変種などの複数の推論戦略の有効性と計算効率を評価した。その結果、より小さな言語モデルと新しい木探索アルゴリズムの組み合わせが、一般的にパレート最適トレードオフを達成することがわかった。これらの結果は、より洗練されたデコーディングアルゴリズムを備えたより小さなモデルを、予算に制約のあるシナリオ、例えばエンドデバイスに導入することで、問題解決の精度を高めることができるという潜在的な利点を強調するものである。例えば、Llemma-7Bモデルは、MATH500においてLlemma-34Bモデルに匹敵する精度を達成しながら、使用するFLOP数が2倍$少ないことを示しています。我々の発見は、成功の尺度が明確に定義されたあらゆる生成タスクに適用できる可能性がある。

要約(オリジナル)

The optimal training configurations of large language models (LLMs) with respect to model sizes and compute budgets have been extensively studied. But how to optimally configure LLMs during inference has not been explored in sufficient depth. We study compute-optimal inference: designing models and inference strategies that optimally trade off additional inference-time compute for improved performance. As a first step towards understanding and designing compute-optimal inference methods, we assessed the effectiveness and computational efficiency of multiple inference strategies such as Greedy Search, Majority Voting, Best-of-N, Weighted Voting, and their variants on two different Tree Search algorithms, involving different model sizes and computational budgets. We found that a smaller language model with a novel tree search algorithm typically achieves a Pareto-optimal trade-off. These results highlight the potential benefits of deploying smaller models equipped with more sophisticated decoding algorithms in budget-constrained scenarios, e.g., on end-devices, to enhance problem-solving accuracy. For instance, we show that the Llemma-7B model can achieve competitive accuracy to a Llemma-34B model on MATH500 while using $2\times$ less FLOPs. Our findings could potentially apply to any generation task with a well-defined measure of success.

arxiv情報

著者	Yangzhen Wu,Zhiqing Sun,Shanda Li,Sean Welleck,Yiming Yang
発行日	2024-08-01 17:16:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー