Measuring and Manipulating Knowledge Representations in Language Models

要約

タイトル：言語モデルにおける知識表現の測定と操作

要約：

– ニューラル言語モデル（LM）はテキストに記述された世界の事実を表現する。これらの事実は、トレーニングデータから派生することもあれば、入力テキスト自体から派生することもある。
– LMの事実表現を調べたり修正したりするツールは、LMが使用されるあらゆる場面で有用である。それによって、世界が変わった場合にそれらを更新し、バイアスのソースを特定し、生成されたテキストのエラーを特定できるようになる。
– REMEDIは、LM内部表現システムのテキストクエリから事実符号化へのマップを学習するアプローチであり、LMの事実表現をクエリして修正することができる。これらの符号化は、知識の編集者として使用できる。既存の事実に一致するように、LMの非表示表現にそれらを追加することで、ダウンストリーム生成を修正できる。
– REMEDI符号化はモデルプローブとしても使用でき、それらをLM表現と比較することにより、LMが言及されたエンティティにどのような特性を帰属するかを調べ、背景知識や入力テキストと衝突する出力を生成する時期を予測できる。
– REMEDIは、プロービング、プロンプティング、およびモデル編集の作業をリンクし、LM内部の知識を細かく検査・制御するための一般的なツールに向けた一歩を提供する。

要約(オリジナル)

Neural language models (LMs) represent facts about the world described by text. Sometimes these facts derive from training data (in most LMs, a representation of the word banana encodes the fact that bananas are fruits). Sometimes facts derive from input text itself (a representation of the sentence ‘I poured out the bottle’ encodes the fact that the bottle became empty). Tools for inspecting and modifying LM fact representations would be useful almost everywhere LMs are used: making it possible to update them when the world changes, to localize and remove sources of bias, and to identify errors in generated text. We describe REMEDI, an approach for querying and modifying factual knowledge in LMs. REMEDI learns a map from textual queries to fact encodings in an LM’s internal representation system. These encodings can be used as knowledge editors: by adding them to LM hidden representations, we can modify downstream generation to be consistent with new facts. REMEDI encodings can also be used as model probes: by comparing them to LM representations, we can ascertain what properties LMs attribute to mentioned entities, and predict when they will generate outputs that conflict with background knowledge or input text. REMEDI thus links work on probing, prompting, and model editing, and offers steps toward general tools for fine-grained inspection and control of knowledge in LMs.

arxiv情報

著者	Evan Hernandez,Belinda Z. Li,Jacob Andreas
発行日	2023-04-03 06:24:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Measuring and Manipulating Knowledge Representations in Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー