要約
大規模な言語モデル(LLMS)の空間的推論は、具体化された知性の基盤です。
However, even in simple maze environments, LLMs still encounter challenges in long-term path-planning, primarily influenced by their spatial hallucination and context inconsistency hallucination by long-term reasoning.
この課題に対処するために、この研究では、革新的なモデル、空間的な変換、カリキュラムQラーニング(S2RCQL)を提案しています。
To address the spatial hallucination of LLMs, we propose the Spatial-to-Relational approach, which transforms spatial prompts into entity relations and paths representing entity relation chains.
このアプローチは、順次思考の観点からLLMの可能性を完全にタップします。
As a result, we design a path-planning algorithm based on Q-learning to mitigate the context inconsistency hallucination, which enhances the reasoning ability of LLMs.
Using the Q-value of state-action as auxiliary information for prompts, we correct the hallucinations of LLMs, thereby guiding LLMs to learn the optimal path.
最後に、LLMSに基づいた逆カリキュラム学習手法を提案して、コンテキストの不一致の幻覚をさらに軽減します。
LLMSは、タスクの難易度を軽減し、それらを活用してより複雑なタスクに取り組むことにより、成功した体験を急速に蓄積できます。
Baiduの自己開発LLM:Ernie-Bot 4.0に基づいて包括的な実験を行いました。
The results showed that our S2RCQL achieved a 23%–40% improvement in both success and optimality rates compared with advanced prompt engineering.
要約(オリジナル)
Spatial reasoning in Large Language Models (LLMs) is the foundation for embodied intelligence. However, even in simple maze environments, LLMs still encounter challenges in long-term path-planning, primarily influenced by their spatial hallucination and context inconsistency hallucination by long-term reasoning. To address this challenge, this study proposes an innovative model, Spatial-to-Relational Transformation and Curriculum Q-Learning (S2RCQL). To address the spatial hallucination of LLMs, we propose the Spatial-to-Relational approach, which transforms spatial prompts into entity relations and paths representing entity relation chains. This approach fully taps the potential of LLMs in terms of sequential thinking. As a result, we design a path-planning algorithm based on Q-learning to mitigate the context inconsistency hallucination, which enhances the reasoning ability of LLMs. Using the Q-value of state-action as auxiliary information for prompts, we correct the hallucinations of LLMs, thereby guiding LLMs to learn the optimal path. Finally, we propose a reverse curriculum learning technique based on LLMs to further mitigate the context inconsistency hallucination. LLMs can rapidly accumulate successful experiences by reducing task difficulty and leveraging them to tackle more complex tasks. We performed comprehensive experiments based on Baidu’s self-developed LLM: ERNIE-Bot 4.0. The results showed that our S2RCQL achieved a 23%–40% improvement in both success and optimality rates compared with advanced prompt engineering.
arxiv情報
著者 | Hourui Deng,Hongjie Zhang,Jie Ou,Chaosheng Feng |
発行日 | 2025-05-07 10:00:50+00:00 |
arxivサイト | arxiv_id(pdf) |
提供元, 利用サービス
arxiv.jp, Google