Do Large Language Models Truly Understand Geometric Structures?

要約

高度な空間理解と抽象的思考が必要なため、幾何学的な能力は大規模言語モデル (LLM) にとって大きな課題です。
既存のデータセットは主に最終的な答えに基づいて LLM を評価しますが、LLM は偶然に正しい答えに到達する可能性があるため、幾何学的構造の真の理解度を真に測定することはできません。
このギャップを埋めるために、問題解決における幾何学的関係の特定の中核ステップを分離することで、LLM の幾何学的構造の理解を評価するように設計された GeomRel データセットを導入します。
このベンチマークを使用して、さまざまな LLM の徹底的な評価を実施し、幾何学的構造を理解する際の重要な制限を特定します。
さらに、LLM の幾何学的関係を識別する能力を強化し、大幅なパフォーマンスの向上をもたらす、Geometry Chain-of-Thought (GeoCoT) 手法を提案します。

要約(オリジナル)

Geometric ability is a significant challenge for large language models (LLMs) due to the need for advanced spatial comprehension and abstract thinking. Existing datasets primarily evaluate LLMs on their final answers, but they cannot truly measure their true understanding of geometric structures, as LLMs can arrive at correct answers by coincidence. To fill this gap, we introduce the GeomRel dataset, designed to evaluate LLMs’ understanding of geometric structures by isolating the core step of geometric relationship identification in problem-solving. Using this benchmark, we conduct thorough evaluations of diverse LLMs and identify key limitations in understanding geometric structures. We further propose the Geometry Chain-of-Thought (GeoCoT) method, which enhances LLMs’ ability to identify geometric relationships, resulting in significant performance improvements.

arxiv情報

著者	Xiaofeng Wang,Yiming Wang,Wenhong Zhu,Rui Wang
発行日	2025-01-23 15:52:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Do Large Language Models Truly Understand Geometric Structures?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー