Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

要約

大規模クラウドシステム (LCS) はますます複雑になるため、システムの信頼性とパフォーマンスを確保するには効果的な異常検出が重要です。
ただし、異常検出方法のベンチマークに使用できる大規模な現実世界のデータセットが不足しています。
このギャップに対処するために、IBM Cloud コンソールから 4.5 か月かけて収集された、IBM Cloud からの新しい高次元データセットを導入します。
このデータセットは、39,365 行と 117,448 列のテレメトリデータで構成されています。
さらに、異常検出のための機械学習モデルの適用を実証し、このプロセスで直面する主な課題について説明します。
この研究とそれに付随するデータセットは、クラウドシステム監視の研究者や実務者にリソースを提供します。
これにより、実世界のデータにおける異常検出方法のより効率的なテストが容易になり、大規模なクラウドインフラストラクチャの健全性とパフォーマンスを維持するための堅牢なソリューションの開発を促進できます。

要約(オリジナル)

As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.

arxiv情報

著者	Mohammad Saiful Islam,Mohamed Sami Rakha,William Pourmajidi,Janakan Sivaloganathan,John Steinbacher,Andriy Miranskyy
発行日	2025-01-06 18:15:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー