Credence: Augmenting Datacenter Switch Buffer Sharing with ML Predictions

要約

全体的なスループットを向上させるために、データセンタースイッチのパケットバッファはすべてのスイッチポートで共有されます。
データセンタースイッチのバッファサイズが縮小する傾向にあるため、バッファ共有は非常に困難であり、パフォーマンスに関する重大な問題となっています。
文献によると、プッシュアウトバッファ共有アルゴリズムは、ドロップテールアルゴリズムに比べてパフォーマンス保証が大幅に優れています。
残念ながら、ハードウェアでプッシュアウト操作がサポートされていないため、スイッチはこれらのアルゴリズムの恩恵を受けることができません。
私たちの重要な観察は、将来のパケットの到着が事前にわかっている場合、ドロップテールバッファーはプッシュアウトバッファーをエミュレートできるということです。
これは、将来の到着についての予測を使用してドロップテールアルゴリズムを強化すると、パフォーマンスが大幅に向上する可能性があることを示唆しています。
この論文は、この方向での最初の研究の試みです。
私たちは、機械学習による予測を強化したドロップテールバッファ共有アルゴリズムである Credence を提案します。
Credence は、これまでプッシュアウトアルゴリズムによってのみ達成できたパフォーマンスを解き放つことができます。
そのパフォーマンスは予測の精度に左右されます。
具体的には、Credence は、完璧な予測で最もよく知られているプッシュアウトアルゴリズム LQD (Longest Queue Drop) のほぼ最適なパフォーマンスを達成しますが、予測誤差が任意に悪化すると、最も単純なドロップテールアルゴリズム Complete Sharing のパフォーマンスまで正常に低下します。
私たちの評価によると、Credence は従来のアプローチと比較してスループットを $1.5$x 向上させます。
フローの完了時間に関して、Credence は、今日のハードウェアでも実用的な既製の機械学習技術を使用して、最先端のアプローチよりも最大 $95\%$ 改善していることを示しています。
私たちは、この研究によって、システムと理論の両方において、将来のいくつかの興味深い研究の機会が開かれると信じています。これについては、この文書の最後で説明します。

要約(オリジナル)

Packet buffers in datacenter switches are shared across all the switch ports in order to improve the overall throughput. The trend of shrinking buffer sizes in datacenter switches makes buffer sharing extremely challenging and a critical performance issue. Literature suggests that push-out buffer sharing algorithms have significantly better performance guarantees compared to drop-tail algorithms. Unfortunately, switches are unable to benefit from these algorithms due to lack of support for push-out operations in hardware. Our key observation is that drop-tail buffers can emulate push-out buffers if the future packet arrivals are known ahead of time. This suggests that augmenting drop-tail algorithms with predictions about the future arrivals has the potential to significantly improve performance. This paper is the first research attempt in this direction. We propose Credence, a drop-tail buffer sharing algorithm augmented with machine-learned predictions. Credence can unlock the performance only attainable by push-out algorithms so far. Its performance hinges on the accuracy of predictions. Specifically, Credence achieves near-optimal performance of the best known push-out algorithm LQD (Longest Queue Drop) with perfect predictions, but gracefully degrades to the performance of the simplest drop-tail algorithm Complete Sharing when the prediction error gets arbitrarily worse. Our evaluations show that Credence improves throughput by $1.5$x compared to traditional approaches. In terms of flow completion times, we show that Credence improves upon the state-of-the-art approaches by up to $95\%$ using off-the-shelf machine learning techniques that are also practical in today’s hardware. We believe this work opens several interesting future work opportunities both in systems and theory that we discuss at the end of this paper.

arxiv情報

著者	Vamsi Addanki,Maciej Pacut,Stefan Schmid
発行日	2024-01-05 13:29:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Credence: Augmenting Datacenter Switch Buffer Sharing with ML Predictions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー