Truth-value judgment in language models: belief directions are context sensitive

要約

最近の研究では、大規模言語モデル (LLM) の潜在空間には、文の真実性を予測する方向が含まれていることが実証されました。
複数の方法がそのような指示を回復し、モデルの「知識」または「信念」を取得すると説明されるプローブを構築します。
私たちは、プローブに対するコンテキストの影響を詳しく調べて、この現象を調査します。
私たちの実験では、LLM 内のどこでプローブの予測が先行する (関連する) 文に条件付きであると記述できるかを確立します。
具体的には、（否定された）裏付け文と矛盾文の存在に対するプローブの応答性を定量化し、その一貫性に関してプローブにスコアを付けます。
また、因果的介入実験も実行し、これらの信念の方向に沿って前提の表現を移動することが、同じ方向に沿った仮説の位置に影響を与えるかどうかを調査します。
私たちがテストするプローブは一般的にコンテキストに依存しますが、真実に影響を及ぼさないはずのコンテキストが依然としてプローブの出力に影響を与えることが多いことがわかります。
私たちの実験では、エラーの種類がレイヤー、モデル (の種類)、データの種類に依存することがわかりました。
最後に、私たちの結果は、信念の方向性が、文脈内の情報を組み込む推論プロセスにおける因果関係の媒介物（の 1 つ）であることを示唆しています。

要約(オリジナル)

Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as getting at a model’s ‘knowledge’ or ‘beliefs’. We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM the probe’s predictions can be described as being conditional on the preceding (related) sentences. Specifically, we quantify the responsiveness of the probes to the presence of (negated) supporting and contradicting sentences, and score the probes on their consistency. We also perform a causal intervention experiment, investigating whether moving the representation of a premise along these belief directions influences the position of the hypothesis along that same direction. We find that the probes we test are generally context sensitive, but that contexts which should not affect the truth often still impact the probe outputs. Our experiments show that the type of errors depend on the layer, the (type of) model, and the kind of data. Finally, our results suggest that belief directions are (one of the) causal mediators in the inference process that incorporates in-context information.

arxiv情報

著者	Stefan F. Schouten,Peter Bloem,Ilia Markov,Piek Vossen
発行日	2024-04-29 16:52:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Truth-value judgment in language models: belief directions are context sensitive

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー