Fast and explainable clustering based on sorting

要約

CLASSIX と呼ばれる高速で説明可能なクラスタリング手法を導入します。
これは 2 つのフェーズで構成されます。1 つは、ソートされたデータを近くのデータポイントのグループに貪欲に集約するフェーズで、その後にグループをクラスターにマージします。
このアルゴリズムは 2 つのスカラーパラメーター、つまり集約の距離パラメーターと最小クラスターサイズを制御するもう 1 つのパラメーターによって制御されます。
さまざまなクラスター形状と低レベルから高レベルの特徴次元を使用して、合成データセットと現実世界のデータセットのクラスタリングパフォーマンスを包括的に評価するために、広範な実験が行われています。
私たちの実験は、CLASSIX が最先端のクラスタリングアルゴリズムと競合することを示しています。
このアルゴリズムは線形空間計算量を備えており、幅広い問題に対してほぼ線形の時間計算量を実現します。
その本質的なシンプルさにより、計算されたクラスターの直感的な説明を生成できます。

要約(オリジナル)

We introduce a fast and explainable clustering method called CLASSIX. It consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by the merging of groups into clusters. The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size. Extensive experiments are conducted to give a comprehensive evaluation of the clustering performance on synthetic and real-world datasets, with various cluster shapes and low to high feature dimensionality. Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms. The algorithm has linear space complexity and achieves near linear time complexity on a wide range of problems. Its inherent simplicity allows for the generation of intuitive explanations of the computed clusters.

arxiv情報

著者	Xinye Chen,Stefan Güttel
発行日	2024-02-15 17:02:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fast and explainable clustering based on sorting

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー