Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark

要約

大規模な言語モデル（LLMS）の最近の進歩は、人工知能を取り巻く哲学的議論を活性化しました。
最も基本的な課題の2つ、つまりフレームの問題とシンボルの接地問題 – は、従来の象徴的なAIシステム内では歴史的には解決できないと見なされてきました。
この研究では、現代のLLMがこれらの問題に対処するために必要な認知能力を持っているかどうかを調査します。
そのために、各問題の哲学的コアを反映した2つのベンチマークタスクを設計し、ゼロショット条件下で13の顕著なLLMS（閉鎖およびオープンソースの両方）に管理し、それぞれ5つの試行でモデルの出力の品質を評価しました。
コンテキストの推論、セマンティックコヒーレンス、情報フィルタリングなど、複数の基準に沿って応答が採点されました。
結果は、オープンソースモデルがモデルサイズ、量子化、および命令チューニングの違いによりパフォーマンスの変動性を示した一方で、いくつかの閉じたモデルが一貫して高スコアを達成したことを示しています。
これらの調査結果は、選択した最新のLLMが、これらの長年の理論的課題に対する意味のある安定した反応を生み出すのに十分な能力を獲得している可能性があることを示唆しています。

要約(オリジナル)

Recent advancements in large language models (LLMs) have revitalized philosophical debates surrounding artificial intelligence. Two of the most fundamental challenges – namely, the Frame Problem and the Symbol Grounding Problem – have historically been viewed as unsolvable within traditional symbolic AI systems. This study investigates whether modern LLMs possess the cognitive capacities required to address these problems. To do so, I designed two benchmark tasks reflecting the philosophical core of each problem, administered them under zero-shot conditions to 13 prominent LLMs (both closed and open-source), and assessed the quality of the models’ outputs across five trials each. Responses were scored along multiple criteria, including contextual reasoning, semantic coherence, and information filtering. The results demonstrate that while open-source models showed variability in performance due to differences in model size, quantization, and instruction tuning, several closed models consistently achieved high scores. These findings suggest that select modern LLMs may be acquiring capacities sufficient to produce meaningful and stable responses to these long-standing theoretical challenges.

arxiv情報

著者	Shoko Oka
発行日	2025-06-09 16:12:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー