Discovering Universal Geometry in Embeddings with ICA

要約

この研究では、独立成分分析 (ICA) を利用して、単語または画像の埋め込み内の一貫した意味構造を明らかにします。
私たちのアプローチでは、主成分分析 (PCA) のホワイトニングプロセス後に残る異方性情報を利用して、事前トレーニングされたモデルの埋め込みから独立した意味論的コンポーネントを抽出します。
我々は、各埋め込みがいくつかの固有の解釈可能な軸の構成として表現できること、およびこれらの意味論的な軸がさまざまな言語、アルゴリズム、およびモダリティにわたって一貫性を保つことを実証します。
エンベディングの幾何学的パターンにおける普遍的な意味構造の発見により、エンベディングにおける表現の理解が深まります。

要約(オリジナル)

This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expressed as a composition of a few intrinsic interpretable axes and that these semantic axes remain consistent across different languages, algorithms, and modalities. The discovery of a universal semantic structure in the geometric patterns of embeddings enhances our understanding of the representations in embeddings.

arxiv情報

著者	Hiroaki Yamagiwa,Momose Oyama,Hidetoshi Shimodaira
発行日	2023-11-02 16:03:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Discovering Universal Geometry in Embeddings with ICA

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー