Clustering with minimum spanning trees: How good can it be?

要約

最小スパニングツリー (MST) は、多数のパターン認識アクティビティにおいてデータセットの便利な表現を提供します。
さらに、計算も比較的高速です。
この論文では、低次元分割データクラスタリングタスクにおいてそれらがどの程度意味があるかを定量化します。
大量のベンチマークデータから最適な (オラクル) アルゴリズムと専門家ラベルの間の一致の上限を特定することにより、MST メソッドが非常に競争力があることがわかります。
次に、いくつかの既存の最先端の MST ベースのパーティショニングスキームを確認、調査、拡張し、一般化します。
これは、いくつかの新しい注目すべきアプローチにつながります。
全体として、Genie および情報理論的手法は、K 平均法、混合ガウス、スペクトルクラスタリング、Birch、密度ベース、古典的な階層的凝集手順などの非 MST アルゴリズムよりも優れていることがよくあります。
それにもかかわらず、私たちはまだ改善の余地があることを確認しており、新しいアルゴリズムの開発が奨励されています。

要約(オリジナル)

Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.

arxiv情報

著者	Marek Gagolewski,Anna Cena,Maciej Bartoszuk,Łukasz Brzozowski
発行日	2024-07-25 14:32:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Clustering with minimum spanning trees: How good can it be?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー