I’ve Got 99 Problems But FLOPS Ain’t One

要約

ハイパースケーラーは、大規模なネットワーク導入の状況を支配していますが、直面している課題に関するデータや洞察をほとんど共有しません。
この優位性に照らして、この分野で解決すべき問題は何でしょうか?
私たちは、機械学習アプリケーション向けに 1,000 億ドルのデータセンターを建設するという公的計画から始めて、関連する研究の方向性を見つけるために型破りなアプローチを採用しています。
言語モデルのスケーリングの法則を活用して、ネットワーク研究に焦点を当てて、そのようなデータセンターがどのようなワークロードを運ぶ可能性があるかを発見し、その際に遭遇する可能性のある課題を調査します。
データセンターの構築とそのようなモデルのトレーニングは技術的に可能であると結論付けていますが、これには DC 間通信用の新しい広域トランスポート、マルチパストランスポート、データセンター内通信用の新しいデータセンタートポロジ、高速スケールアップネットワークとトランスポートが必要です。
ネットワーキングコミュニティ向けの豊富な研究課題の概要を示します。

要約(オリジナル)

Hyperscalers dominate the landscape of large network deployments, yet they rarely share data or insights about the challenges they face. In light of this supremacy, what problems can we find to solve in this space? We take an unconventional approach to find relevant research directions, starting from public plans to build a $100 billion datacenter for machine learning applications. Leveraging the language models scaling laws, we discover what workloads such a datacenter might carry and explore the challenges one may encounter in doing so, with a focus on networking research. We conclude that building the datacenter and training such models is technically possible, but this requires novel wide-area transports for inter-DC communication, a multipath transport and novel datacenter topologies for intra-datacenter communication, high speed scale-up networks and transports, outlining a rich research agenda for the networking community.

arxiv情報

著者	Alexandru M. Gherghescu,Vlad-Andrei Bădoiu,Alexandru Agache,Mihai-Valentin Dumitru,Iuliu Vasilescu,Radu Mantu,Costin Raiciu
発行日	2024-10-23 14:00:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

I’ve Got 99 Problems But FLOPS Ain’t One

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー