Standards for Belief Representations in LLMs

要約

大規模言語モデル (LLM) がさまざまな領域にわたって顕著な能力を示し続ける中、コンピューター科学者は、特に LLM が世界についての信念を内部的にどのように表現するか (そして表現するかどうか) に関して、その認知プロセスを理解する方法を開発しています。
しかし、この分野には現在、LLM への信念の研究を支える統一された理論的基盤がありません。
この記事では、LLM の表現が信念に似たものとしてカウントされるための十分性条件を提案することで、このギャップを埋め始めます。
私たちは、LLM における信念測定のプロジェクトは、意思決定理論や形式的認識論で行われる信念測定と顕著な特徴を共有しているが、信念の測定方法を変える必要がある点でも異なっていると主張します。
したがって、哲学の洞察と機械学習の現代の実践に基づいて、理論的考察と実際的な制約のバランスをとる 4 つの基準を確立します。
私たちが提案する基準には、正確さ、一貫性、均一性、使用法が含まれており、これらは LLM における信念表現を包括的に理解するための基礎を築くのに役立ちます。
私たちは、信念の表現を特定するためにさまざまな基準を単独で使用することの限界を示す経験的研究に基づいています。

要約(オリジナル)

As large language models (LLMs) continue to demonstrate remarkable abilities across various domains, computer scientists are developing methods to understand their cognitive processes, particularly concerning how (and if) LLMs internally represent their beliefs about the world. However, this field currently lacks a unified theoretical foundation to underpin the study of belief in LLMs. This article begins filling this gap by proposing adequacy conditions for a representation in an LLM to count as belief-like. We argue that, while the project of belief measurement in LLMs shares striking features with belief measurement as carried out in decision theory and formal epistemology, it also differs in ways that should change how we measure belief. Thus, drawing from insights in philosophy and contemporary practices of machine learning, we establish four criteria that balance theoretical considerations with practical constraints. Our proposed criteria include accuracy, coherence, uniformity, and use, which together help lay the groundwork for a comprehensive understanding of belief representation in LLMs. We draw on empirical work showing the limitations of using various criteria in isolation to identify belief representations.

arxiv情報

著者	Daniel A. Herrmann,Benjamin A. Levinstein
発行日	2024-05-31 17:21:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Standards for Belief Representations in LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー