Consistent spectral clustering in sparse tensor block models

要約

高次クラスタリングは、バイオインフォマティクス、ソーシャルネットワーク分析、推奨システムなどのさまざまな分野で普及しているマルチウェイデータセット内のオブジェクトを分類することを目的としています。
これらのタスクには、疎で高次元のデータが含まれることが多く、統計的および計算上、重大な課題が生じます。
この論文では、スパース整数値データテンソル用に特別に設計されたテンソルブロックモデルを紹介します。
ノイズの変動を軽減し、アルゴリズムの一貫性を保証する密度閾値を特定するためにトリミングステップを追加した単純なスペクトルクラスタリングアルゴリズムを提案します。
私たちのアプローチは、サブポアソンノイズ集中フレームワークを使用してスパース性をモデル化し、サブガウステールよりも重いテールに対応します。
注目すべきことに、この自然なクラスのテンソルブロックモデルは、任意のモードにわたる集約の下で閉じられます。
その結果、データ集約中の信号損失とノイズ低減の間のトレードオフを評価するための包括的なフレームワークが得られます。
この分析は、疎なランダムなグラム行列に結合した新しい濃度に基づいています。
理論的発見はシミュレーション実験を通じて説明されます。

要約(オリジナル)

High-order clustering aims to classify objects in multiway datasets that are prevalent in various fields such as bioinformatics, social network analysis, and recommendation systems. These tasks often involve data that is sparse and high-dimensional, presenting significant statistical and computational challenges. This paper introduces a tensor block model specifically designed for sparse integer-valued data tensors. We propose a simple spectral clustering algorithm augmented with a trimming step to mitigate noise fluctuations, and identify a density threshold that ensures the algorithm’s consistency. Our approach models sparsity using a sub-Poisson noise concentration framework, accommodating heavier than sub-Gaussian tails. Remarkably, this natural class of tensor block models is closed under aggregation across arbitrary modes. Consequently, we obtain a comprehensive framework for evaluating the tradeoff between signal loss and noise reduction during data aggregation. The analysis is based on a novel concentration bound for sparse random Gram matrices. The theoretical findings are illustrated through simulation experiments.

arxiv情報

著者	Ian Välimaa,Lasse Leskelä
発行日	2025-01-23 16:41:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Consistent spectral clustering in sparse tensor block models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー