Linear Spaces of Meanings: the Compositional Language of VLMs


事前トレーニング済みの視覚言語モデル (VLM) からのベクトル データ埋め込みの構成構造を調査します。
対照的に、テキスト エンコーダーからのラベル表現を、埋め込み空間内のより小さいベクトル セットの組み合わせとして近似しようとします。


We investigate compositional structures in vector data embeddings from pre-trained vision-language models (VLMs). Traditionally, compositionality has been associated with algebraic operations on embeddings of words from a pre-existing vocabulary. In contrast, we seek to approximate label representations from a text encoder as combinations of a smaller set of vectors in the embedding space. These vectors can be seen as ‘ideal words’ which can be used to generate new concepts in an efficient way. We present a theoretical framework for understanding linear compositionality, drawing connections with mathematical representation theory and previous definitions of disentanglement. We provide theoretical and empirical evidence that ideal words provide good compositional approximations of composite concepts and can be more effective than token-based decompositions of the same concepts.


著者 Matthew Trager,Pramuditha Perera,Luca Zancato,Alessandro Achille,Parminder Bhatia,Bing Xiang,Stefano Soatto
発行日 2023-02-28 08:11:56+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG パーマリンク