Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

要約

同時に共同クラスター化して、行と列をクラスターして、より微細なグループが明らかになります。
ただし、既存の共同クラスタリング方法は、スケーラビリティが不十分であり、大規模なデータを処理することはできません。
このペーパーでは、高次元の大規模なデータセットの複雑なパターンを明らかにするために設計された、新規でスケーラブルな共同クラスタリング方法を紹介します。
具体的には、最初に、大きなマトリックスをより小さなサブマトリックに分割する大きなマトリックスパーティションアルゴリズムを提案し、平行な共同クラスタリングを可能にします。
この方法では、確率的モデルを採用して、小規模の構成を最適化し、計算効率と分析深度のバランスを取ります。
さらに、これらのサブマトリックから共同クラスターを効率的に識別および統合する階層的な共同クラスターのマージアルゴリズムを提案し、プロセスの堅牢性と信頼性を高めます。
広範な評価は、当社の方法の有効性と効率を検証します。
実験結果は、計算時間の大幅な短縮を示し、密度の高いマトリックスでは約83％の減少、スパースマトリックスでは最大30％減少します。

要約(オリジナル)

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness and efficiency of our method. Experimental results demonstrate a significant reduction in computation time, with an approximate 83% decrease for dense matrices and up to 30% for sparse matrices.

arxiv情報

著者	Zihan Wu,Zhaoke Huang,Hong Yan
発行日	2025-03-19 14:36:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー