Unsupervised detection of semantic correlations in big data

要約

実際のデータでは、情報は非常に大きな機能ベクトルに保存されます。
これらの変数は、通常、多くの機能が同時に関与する複雑な相互作用のために相関しています。
このような相関関係は、セマンティックの役割に定性的に対応しており、人間の脳と人工ニューラルネットワークの両方によって自然に認識されています。
この認識により、たとえば、コンテキストに基づいて画像またはテキストの不足している部分の予測が可能になります。
バイナリ数として表される高次元データでこれらの相関を検出する方法を提示します。
データセットのバイナリ固有の次元を推定します。これにより、データを説明するために必要な独立した座標の最小数を定量化するため、セマンティックな複雑さの代理です。
提案されたアルゴリズムは、次元のいわゆる呪いに大部分が鈍感であるため、ビッグデータ分析で使用できます。
モデル磁気システムの位相遷移を識別するこのアプローチをテストし、深いニューラルネットワーク内の画像とテキストのセマンティック相関の検出に適用します。

要約(オリジナル)

In real-world data, information is stored in extremely large feature vectors. These variables are typically correlated due to complex interactions involving many features simultaneously. Such correlations qualitatively correspond to semantic roles and are naturally recognized by both the human brain and artificial neural networks. This recognition enables, for instance, the prediction of missing parts of an image or text based on their context. We present a method to detect these correlations in high-dimensional data represented as binary numbers. We estimate the binary intrinsic dimension of a dataset, which quantifies the minimum number of independent coordinates needed to describe the data, and is therefore a proxy of semantic complexity. The proposed algorithm is largely insensitive to the so-called curse of dimensionality, and can therefore be used in big data analysis. We test this approach identifying phase transitions in model magnetic systems and we then apply it to the detection of semantic correlations of images and text inside deep neural networks.

arxiv情報

著者	Santiago Acevedo,Alex Rodriguez,Alessandro Laio
発行日	2025-03-07 15:21:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Unsupervised detection of semantic correlations in big data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー