Boosting K-means for Big Data by Fusing Data Streaming with Global Optimization

要約

K 平均法クラスタリングはデータマイニングの基礎ですが、大規模なデータセットに直面すると効率が低下します。
この制限に対処するために、変数近傍検索 (VNS) メタヒューリスティックを利用してビッグデータの K 平均法クラスタリングを最適化する新しいヒューリスティックアルゴリズムを提案します。
私たちのアプローチは、最小二乗和クラスタリング (MSSC) 定式化を元のビッグデータセットからのランダムサンプルに制限することによって得られる部分目的関数ランドスケープの逐次最適化に基づいています。
各ランドスケープ内で、すべての縮退したさまざまな数の追加重心を再初期化することによって、現在最適な (既存の) ソリューションの近傍を体系的に拡張しながら探索されます。
多数の実世界のデータセットに対する広範かつ厳密な実験により、従来のローカル検索をグローバル検索に変換することで、私たちのアルゴリズムがビッグデータ環境における K 平均法クラスタリングの精度と効率を大幅に向上させ、新しい状態となることが明らかになりました。
フィールドのアート。

要約(オリジナル)

K-means clustering is a cornerstone of data mining, but its efficiency deteriorates when confronted with massive datasets. To address this limitation, we propose a novel heuristic algorithm that leverages the Variable Neighborhood Search (VNS) metaheuristic to optimize K-means clustering for big data. Our approach is based on the sequential optimization of the partial objective function landscapes obtained by restricting the Minimum Sum-of-Squares Clustering (MSSC) formulation to random samples from the original big dataset. Within each landscape, systematically expanding neighborhoods of the currently best (incumbent) solution are explored by reinitializing all degenerate and a varying number of additional centroids. Extensive and rigorous experimentation on a large number of real-world datasets reveals that by transforming the traditional local search into a global one, our algorithm significantly enhances the accuracy and efficiency of K-means clustering in big data environments, becoming the new state of the art in the field.

arxiv情報

著者	Ravil Mussabayev,Rustam Mussabayev
発行日	2024-10-18 15:43:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Boosting K-means for Big Data by Fusing Data Streaming with Global Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー