Generating Symbolic World Models via Test-time Scaling of Large Language Models

要約

複雑な計画の問題を解決するには、規則違反を回避し、制約を順守し、自然言語の固有の曖昧さによって妨げられているタスクを妨げる最適性を確保するために、州の移行を明示的にモデル化するために、大規模な言語モデル（LLM）が必要です。
このような曖昧さを克服するために、計画ドメイン定義言語（PDDL）は、正確かつ正式な状態の説明を可能にする計画抽象として活用されています。
PDDLを使用すると、A*などの古典的な検索アルゴリズムをシームレスに適用して最適な計画を見つけることができるシンボリックワールドモデルを生成できます。
ただし、PDDLトレーニングデータが不足しているため、現在のLLMを使用してPDDLドメインを直接生成することは、未解決の課題のままです。
この課題に対処するために、LLMSのテスト時間計算をスケールアップしてPDDLの推論機能を強化し、それによって高品質のPDDLドメインの生成を可能にすることを提案します。
具体的には、最初に初期ソリューションの品質を向上させるためにベストアンドサンプリングアプローチを採用し、次に言語化された機械学習で微調整された方法でソリューションを改良するシンプルでありながら効果的なアルゴリズムを導入します。
私たちの方法は、PDDLドメインの生成にかなりのマージンでO1-MINIを上回り、2つのタスクで50を超える成功率を達成します（つまり、自然言語の記述またはPDDL問題からPDDLドメインを生成します）。
これは、追加のトレーニングを必要とせずに行われます。
PDDLを状態の抽象化として利用することにより、私たちの方法は、ほぼすべての競争レベルの計画タスクで現在の最先端の方法を上回ることができます。

要約(オリジナル)

Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindered by the inherent ambiguity of natural language. To overcome such ambiguity, Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions. With PDDL, we can generate a symbolic world model where classic searching algorithms, such as A*, can be seamlessly applied to find optimal plans. However, directly generating PDDL domains with current LLMs remains an open challenge due to the lack of PDDL training data. To address this challenge, we propose to scale up the test-time computation of LLMs to enhance their PDDL reasoning capabilities, thereby enabling the generation of high-quality PDDL domains. Specifically, we introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning. Our method outperforms o1-mini by a considerable margin in the generation of PDDL domains, achieving over 50\% success rate on two tasks (i.e., generating PDDL domains from natural language description or PDDL problems). This is done without requiring additional training. By taking advantage of PDDL as state abstraction, our method is able to outperform current state-of-the-art methods on almost all competition-level planning tasks.

arxiv情報

著者	Zhouliang Yu,Yuhuan Yuan,Tim Z. Xiao,Fuxiang Frank Xia,Jie Fu,Ge Zhang,Ge Lin,Weiyang Liu
発行日	2025-05-08 13:42:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generating Symbolic World Models via Test-time Scaling of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー