DCSI — An improved measure of cluster separability based on separation and connectedness

要約

特定のデータセット内のクラスラベルが意味のあるクラスターに対応するかどうかは、現実世界のデータセットを使用したクラスタリングアルゴリズムの評価にとって重要です。
この特性は、分離可能性の尺度によって定量化できます。
密度ベースのクラスタリングの分離可能性の中心的な側面は、クラス間の分離とクラス内の接続性であり、分類ベースの複雑さの尺度もクラスター妥当性指数 (CVI) もそれらを適切に組み込んでいません。
新しく開発された尺度 (密度クラスター分離性指数、DCSI) は、これら 2 つの特性を定量化することを目的としており、CVI としても使用できます。
合成データに関する広範な実験により、DCSI は調整されたランドインデックス (ARI) を介して測定された DBSCAN のパフォーマンスと強い相関があるが、密度ベースのハードクラスタリングには不向きなクラスが重複するマルチクラスデータセットに関しては堅牢性に欠けることが示されています。
。
頻繁に使用される現実世界のデータセットの詳細な評価により、DCSI が意味のある密度ベースのクラスターに対応しない接触または重複するクラスを正確に識別できることが示されています。

要約(オリジナル)

Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not correspond to meaningful density-based clusters.

arxiv情報

著者	Jana Gauss,Fabian Scheipl,Moritz Herrmann
発行日	2024-07-01 14:04:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DCSI — An improved measure of cluster separability based on separation and connectedness

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー