Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models

要約

Word2vecなどの自己監視ワード埋め込みアルゴリズムは、言語モデリングの表現学習を研究するための最小限の設定を提供します。
Originの周りのWord2Vec損失のQuartic Taylor近似を調べ、結果のトレーニングダイナミクスとダウンストリームタスクの最終パフォーマンスの両方が、Word2VECのものと経験的に非常に類似していることを示します。
私たちの主な貢献は、勾配フロートレーニングのダイナミクスと、コーパス統計とトレーニングハイパーパラメーターのみの観点からの最終的な単語埋め込みの両方を分析的に解決することです。
ソリューションは、これらのモデルが直交線形サブスペースを一度に1つずつ学習し、それぞれがモデル容量が飽和するまで埋め込みの有効ランクを増加させることを明らかにしています。
ウィキペディアでのトレーニングでは、最上部の線形サブスペースのそれぞれが解釈可能なトピックレベルの概念を表していることがわかります。
最後に、私たちの理論を適用して、トレーニング中により抽象的なセマンティック概念の線形表現がどのように出現するかを説明します。
これらは、ベクターの加算を介して類推を完了するために使用できます。

要約(オリジナル)

Self-supervised word embedding algorithms such as word2vec provide a minimal setting for studying representation learning in language modeling. We examine the quartic Taylor approximation of the word2vec loss around the origin, and we show that both the resulting training dynamics and the final performance on downstream tasks are empirically very similar to those of word2vec. Our main contribution is to analytically solve for both the gradient flow training dynamics and the final word embeddings in terms of only the corpus statistics and training hyperparameters. The solutions reveal that these models learn orthogonal linear subspaces one at a time, each one incrementing the effective rank of the embeddings until model capacity is saturated. Training on Wikipedia, we find that each of the top linear subspaces represents an interpretable topic-level concept. Finally, we apply our theory to describe how linear representations of more abstract semantic concepts emerge during training; these can be used to complete analogies via vector addition.

arxiv情報

著者	Dhruva Karkada,James B. Simon,Yasaman Bahri,Michael R. DeWeese
発行日	2025-05-28 15:55:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー