Probing the topology of the space of tokens with structured prompts

要約

この記事では、大規模な言語モデル（LLM）に（隠された）トークン入力が同質性に埋め込まれていることを明らかにするように促すための一般的かつ柔軟な方法を紹介します。
さらに、この記事は、この方法が機能すると予想される理由について、一般的なLLMの数学的証拠である強力な理論的正当化を提供します。
この方法を手に入れると、LLEMMA-7Bのトークン部分空間を回復することにより、その有効性を実証します。
この論文の結果は、LLMSだけでなく、一般的な非線形自己回帰プロセスにも適用されます。

要約(オリジナル)

This article presents a general and flexible method for prompting a large language model (LLM) to reveal its (hidden) token input embedding up to homeomorphism. Moreover, this article provides strong theoretical justification — a mathematical proof for generic LLMs — for why this method should be expected to work. With this method in hand, we demonstrate its effectiveness by recovering the token subspace of Llemma-7B. The results of this paper apply not only to LLMs but also to general nonlinear autoregressive processes.

arxiv情報

著者	Michael Robinson,Sourya Dey,Taisa Kushner
発行日	2025-03-19 17:01:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Probing the topology of the space of tokens with structured prompts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー