Enhancing Representation Learning on High-Dimensional, Small-Size Tabular Data: A Divide and Conquer Method with Ensembled VAEs

要約

変分オートエンコーダとその多くのバリアントは、次元削減を実行する優れた能力を示し、多くの場合、最先端のパフォーマンスを実現します。
しかし、現在の手法の多くは、本質的に困難な設定である高次元低サンプルサイズ (HDLSS) タスクで適切な表現を学習するのに苦労しています。
私たちは、軽量 VAE のアンサンブルを使用して、特徴空間のサブセットにわたる事後分布を学習することでこの課題に対処します。これは、新しい分割統治アプローチで統合事後分布に集約されます。
具体的には、より高いサンプル効率をもたらす暗黙的なデータ拡張の形式を誘発する関節事後分解の代替因数分解を提案します。
8 つの現実世界のデータセットに対する一連の実験を通じて、私たちの方法が HDLSS 設定でより優れた潜在表現を学習し、それが下流の分類タスクの精度の向上につながることを示します。
さらに、私たちのアプローチがもつれの解消にプラスの効果をもたらし、学習された表現に対してより低い推定総相関を達成することを検証します。
最後に、私たちのアプローチが推論時の部分的な特徴に対して堅牢であり、ほとんどの特徴が欠落している場合でもパフォーマンスの低下がほとんどないことを示します。

要約(オリジナル)

Variational Autoencoders and their many variants have displayed impressive ability to perform dimensionality reduction, often achieving state-of-the-art performance. Many current methods however, struggle to learn good representations in High Dimensional, Low Sample Size (HDLSS) tasks, which is an inherently challenging setting. We address this challenge by using an ensemble of lightweight VAEs to learn posteriors over subsets of the feature-space, which get aggregated into a joint posterior in a novel divide-and-conquer approach. Specifically, we present an alternative factorisation of the joint posterior that induces a form of implicit data augmentation that yields greater sample efficiency. Through a series of experiments on eight real-world datasets, we show that our method learns better latent representations in HDLSS settings, which leads to higher accuracy in a downstream classification task. Furthermore, we verify that our approach has a positive effect on disentanglement and achieves a lower estimated Total Correlation on learnt representations. Finally, we show that our approach is robust to partial features at inference, exhibiting little performance degradation even with most features missing.

arxiv情報

著者	Navindu Leelarathna,Andrei Margeloiu,Mateja Jamnik,Nikola Simidjievski
発行日	2023-06-27 17:55:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Representation Learning on High-Dimensional, Small-Size Tabular Data: A Divide and Conquer Method with Ensembled VAEs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー