An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs

要約

最近の進歩において、大規模な言語モデル（LLMS）は、コード生成と考え方の推論に習熟しており、自動的な正式な計画タスクに取り組むための基礎を築きました。
この研究では、人工知能計画の重要な表現である計画ドメイン定義言語（PDDL）を理解して生成するLLMの可能性を評価します。
私たちは、商業とオープンソースの両方で、7つの主要なLLMファミリーにまたがる20の異なるモデルで広範な分析を実施しています。
当社の包括的な評価は、PDDLを使用して解析、生成、および推論のゼロショットLLM機能に光を当てています。
私たちの調査結果は、PDDLの処理において顕著な有効性を示しているモデルもあれば、微妙な計画知識を必要とするより複雑なシナリオに制限をもたらすモデルもあります。
これらの結果は、正式な計画タスクにおけるLLMの約束と現在の制限を強調し、アプリケーションに関する洞察を提供し、AI主導の計画パラダイムで将来の努力を導きます。

要約(オリジナル)

In recent advancements, large language models (LLMs) have exhibited proficiency in code generation and chain-of-thought reasoning, laying the groundwork for tackling automatic formal planning tasks. This study evaluates the potential of LLMs to understand and generate Planning Domain Definition Language (PDDL), an essential representation in artificial intelligence planning. We conduct an extensive analysis across 20 distinct models spanning 7 major LLM families, both commercial and open-source. Our comprehensive evaluation sheds light on the zero-shot LLM capabilities of parsing, generating, and reasoning with PDDL. Our findings indicate that while some models demonstrate notable effectiveness in handling PDDL, others pose limitations in more complex scenarios requiring nuanced planning knowledge. These results highlight the promise and current limitations of LLMs in formal planning tasks, offering insights into their application and guiding future efforts in AI-driven planning paradigms.

arxiv情報

著者	Kaustubh Vyas,Damien Graux,Sébastien Montella,Pavlos Vougiouklis,Ruofei Lai,Keshuang Li,Yang Ren,Jeff Z. Pan
発行日	2025-02-27 15:13:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー