P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced Clustering

要約

ラベル情報を使用せずに表現とセマンティッククラスタリングを学習するディープクラスタリングは、ディープラーニングベースのアプローチにとって大きな課題となります。
近年の大きな進歩にもかかわらず、既存の手法のほとんどは均一に分散されたデータセットに焦点を当てており、その手法の実際的な適用可能性は大幅に制限されています。
この論文では、まず、基礎となるクラスが不均衡な分布を示す、深い不均衡クラスタリングと呼ばれる、より実用的な問題設定を紹介します。
この問題に取り組むために、私たちは新しい擬似ラベルベースの学習フレームワークを提案します。
私たちのフレームワークは、擬似ラベル生成を段階的な部分最適トランスポート問題として定式化します。これは、事前の分布制約の下で各サンプルを不均衡なクラスターに段階的にトランスポートすることで、不均衡を認識した擬似ラベルを生成し、信頼性の高いサンプルから学習します。
さらに、最初の定式化を、制約が強化された不均衡な最適輸送問題に変換します。これは、高速行列スケーリングアルゴリズムによって効率的に解決できます。
人間が厳選したロングテールCIFAR100、挑戦的なImageNet-R、およびきめの細かいiNaturalist2018データセットの大規模なサブセットを含むさまざまなデータセットでの実験は、私たちの方法の優位性を実証しています。

要約(オリジナル)

Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches. Despite significant progress in recent years, most existing methods focus on uniformly distributed datasets, significantly limiting the practical applicability of their methods. In this paper, we first introduce a more practical problem setting named deep imbalanced clustering, where the underlying classes exhibit an imbalance distribution. To tackle this problem, we propose a novel pseudo-labeling-based learning framework. Our framework formulates pseudo-label generation as a progressive partial optimal transport problem, which progressively transports each sample to imbalanced clusters under prior distribution constraints, thus generating imbalance-aware pseudo-labels and learning from high-confident samples. In addition, we transform the initial formulation into an unbalanced optimal transport problem with augmented constraints, which can be solved efficiently by a fast matrix scaling algorithm. Experiments on various datasets, including a human-curated long-tailed CIFAR100, challenging ImageNet-R, and large-scale subsets of fine-grained iNaturalist2018 datasets, demonstrate the superiority of our method.

arxiv情報

著者	Chuyu Zhang,Hui Ren,Xuming He
発行日	2024-01-17 15:15:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced Clustering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー