On Speeding Up Language Model Evaluation

要約

現在、大規模言語モデル (LLM) は自然言語処理 (NLP) の分野を支配しており、さまざまなタスクにわたって最先端の技術を提供しています。
トレーニングから推論に至るまで、この種のモデルを開発するには、組み合わせ検索問題を定義する多数の決定を下す必要があります。
たとえば、タスクで最高のパフォーマンスを達成するために最適な事前トレーニング済み LLM、プロンプト、またはハイパーパラメーターを選択するには、多くの場合、テストセット全体で複数の候補を評価する必要があります。
LLM による推論とメトリックの計算はどちらもリソースを大量に消費するため、この徹底的な評価には時間とコストがかかる可能性があります。
この論文では、限られた予算内でテスト例の方法を評価するための最良の方法を特定するという課題に取り組みます。
評価する次のメソッドと例のペアを順番に選択する、十分に研究されたマルチアームバンディットフレームワークを活用することで、マルチアームバンディットアルゴリズムと低ランク因数分解を組み合わせたアプローチにより、必要なリソースが大幅に削減されます。
実験によると、当社のアルゴリズムは通常必要なリソースの 5 ～ 15\% のみを使用して最高のパフォーマンスのメソッドを特定でき、結果としてコストが 85 ～ 95\% 削減されることがわかりました。

要約(オリジナル)

Large language models (LLMs) currently dominate the field of natural language processing (NLP), representing the state-of-the-art across a diverse array of tasks. Developing a model of this nature, from training to inference, requires making numerous decisions which define a combinatorial search problem. For example, selecting the optimal pre-trained LLM, prompt, or hyperparameters to attain the best performance for a task often requires evaluating multiple candidates on an entire test set. This exhaustive evaluation can be time-consuming and costly, as both inference and metric computation with LLMs are resource-intensive. In this paper, we address the challenge of identifying the best method within a limited budget for evaluating methods on test examples. By leveraging the well-studied multi-armed bandit framework, which sequentially selects the next method-example pair to evaluate, our approach, combining multi-armed bandit algorithms with low-rank factorization, significantly reduces the required resources. Experiments show that our algorithms can identify the top-performing method using only 5-15\% of the typically needed resources, resulting in an 85-95\% reduction in cost.

arxiv情報

著者	Jin Peng Zhou,Christian K. Belardi,Ruihan Wu,Travis Zhang,Carla P. Gomes,Wen Sun,Kilian Q. Weinberger
発行日	2024-07-08 17:48:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Speeding Up Language Model Evaluation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー