AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding

要約

このペーパーでは、きめ細かい投機的デコードを通じて SLO カスタマイズをサポートする最初の LLM サービングシステムである AdaServe について紹介します。
AdaServe は、ドラフトモデルのロジットを活用してトークンの投機精度を予測し、理論的に最適なアルゴリズムを採用して検証用のトークンツリーを構築します。
スループットを犠牲にすることなく多様な SLO 要件に対応するために、AdaServe は、最初にリクエストごとに候補トークンツリーを構築し、次にスループットを最適化しながら個々の SLO 制約を満たすトークンを動的に選択する推測と選択のスキームを採用しています。
包括的な評価により、AdaServe は最先端のシステムと比較して、最大 73% 高い SLO 達成と 74% 高いグッドプットを達成していることが実証されています。
これらの結果は、さまざまなアプリケーションシナリオにわたって LLM 導入の効率と適応性を強化する AdaServe の可能性を強調しています。

要約(オリジナル)

This paper introduces AdaServe, the first LLM serving system to support SLO customization through fine-grained speculative decoding. AdaServe leverages the logits of a draft model to predict the speculative accuracy of tokens and employs a theoretically optimal algorithm to construct token trees for verification. To accommodate diverse SLO requirements without compromising throughput, AdaServe employs a speculation-and-selection scheme that first constructs candidate token trees for each request and then dynamically selects tokens to meet individual SLO constraints while optimizing throughput. Comprehensive evaluations demonstrate that AdaServe achieves up to 73% higher SLO attainment and 74% higher goodput compared to state-of-the-art systems. These results underscore AdaServe’s potential to enhance the efficiency and adaptability of LLM deployments across varied application scenarios.

arxiv情報

著者	Zikun Li,Zhuofu Chen,Remi Delacourt,Gabriele Oliaro,Zeyu Wang,Qinghan Chen,Shuhuai Lin,April Yang,Zhihao Zhang,Zhuoming Chen,Sean Lai,Xupeng Miao,Zhihao Jia
発行日	2025-01-21 14:15:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー