Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation

要約

オンライン・ディープ・クラスタリングとは、特徴抽出ネットワークとクラスタリング・モデルを共同で使用し、新しいデータ・ポイントやバッチが処理されるたびにクラスタ・ラベルを割り当てることを指す。オフラインの手法よりも高速で汎用性が高い一方で、オンラインクラスタリングは、エンコーダがすべての入力を同じポイントにマッピングし、すべてが単一のクラスタに入れられるという崩壊解に簡単に到達する可能性がある。成功している既存のモデルはこの問題を回避するために様々な技術を採用しているが、そのほとんどはデータの増強を必要とするか、データセット全体の平均的なソフト割り当てを各クラスタについて同じにすることを目的としている。我々は、データ補強を必要とせず、既存の手法とは異なり、ハード割り当てを正則化する手法を提案する。ベイジアンフレームワークを用い、エンコーダネットワークの訓練に簡単に組み込める直感的な最適化目標を導出する。つの画像データセットと1つの人間の行動認識データセットでテストした結果、他の手法よりも一貫して破綻を頑健に回避し、より正確なクラスタリングを導いた。また、さらなる実験と分析を行い、困難なクラスタ割り当てを正則化する我々の選択を正当化する。コードはhttps://github.com/Lou1sM/online_hard_clustering。

要約(オリジナル)

Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. Successful existing models have employed various techniques to avoid this problem, most of which require data augmentation or which aim to make the average soft assignment across the dataset the same for each cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments. Using a Bayesian framework, we derive an intuitive optimization objective that can be straightforwardly included in the training of the encoder network. Tested on four image datasets and one human-activity recognition dataset, it consistently avoids collapse more robustly than other methods and leads to more accurate clustering. We also conduct further experiments and analyses justifying our choice to regularize the hard cluster assignments. Code is available at https://github.com/Lou1sM/online_hard_clustering.

arxiv情報

著者	Louis Mahon,Thomas Lukasiewicz
発行日	2024-03-01 10:22:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー