Queueing, Predictions, and LLMs: Challenges and Open Problems

要約

キューイングシステムは、システムパフォーマンスを改善するために、推定サービス時間などの機械学習予測を適用する多くの機会を提供します。
この統合は、スケジューリングの決定を改善するために予測を効果的に活用する方法について、多くの未解決の疑問を提起します。
最近の研究では、予測されるサービス時間を伴うキューを調査し、通常はシステムの雇用時間を最小限に抑えることを目指しています。
これらの作品をレビューし、予測の有効性を強調し、キューのパフォーマンスに関するオープンな質問を提示します。
次に、スケジューリングで予測を使用する重要な実用的な例、つまり大規模な言語モデル（LLM）システムを検討し、新しいスケジューリングの課題を提示し、パフォーマンスを改善する予測の可能性を強調します。
特に、LLMSが推論を実行することを検討します。
LLMシステムの推論要求（ジョブ）は本質的に複雑です。
それらは、可変推論時間、キー値（kV）ストアメモリの制限によって制約される動的メモリフットプリント、およびパフォーマンスに異なる影響を与える複数の可能な先制アプローチを持っています。
LLMシステムでのスケジューリングの重要な側面に関する背景を提供し、新しいモデルとそれらから発生するオープンな問題を導入します。
キューイング理論からLLMシステムのスケジューリングに洞察と分析を適用する重要な機会があると主張します。

要約(オリジナル)

Queueing systems present many opportunities for applying machine-learning predictions, such as estimated service times, to improve system performance. This integration raises numerous open questions about how predictions can be effectively leveraged to improve scheduling decisions. Recent studies explore queues with predicted service times, typically aiming to minimize job time in the system. We review these works, highlight the effectiveness of predictions, and present open questions on queue performance. We then move to consider an important practical example of using predictions in scheduling, namely Large Language Model (LLM) systems, which presents novel scheduling challenges and highlights the potential for predictions to improve performance. In particular, we consider LLMs performing inference. Inference requests (jobs) in LLM systems are inherently complex; they have variable inference times, dynamic memory footprints that are constrained by key-value (KV) store memory limitations, and multiple possible preemption approaches that affect performance differently. We provide background on the important aspects of scheduling in LLM systems, and introduce new models and open problems that arise from them. We argue that there are significant opportunities for applying insights and analysis from queueing theory to scheduling in LLM systems.

arxiv情報

著者	Michael Mitzenmacher,Rana Shahout
発行日	2025-03-10 17:12:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Queueing, Predictions, and LLMs: Challenges and Open Problems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー