TravelPlanner: A Benchmark for Real-World Planning with Language Agents

要約

人工知能の構想以来、計画は人工知能の中心的な追求の一部でしたが、人間レベルの計画に必要な認知基盤の多くが不足していたため、初期の AI エージェントは主に制約された設定に焦点を当てていました。
最近、大規模言語モデル (LLM) を利用した言語エージェントが、ツールの使用や推論などの興味深い機能を示しています。
これらの言語エージェントは、以前の AI エージェントでは手の届かない、より複雑な設定で計画を立てることができますか?
この調査を進めるために、現実世界の一般的な計画シナリオである旅行計画に焦点を当てた新しい計画ベンチマークである TravelPlanner を提案します。
豊富なサンドボックス環境、約 400 万のデータレコードにアクセスするためのさまざまなツール、および細心の注意を払って厳選された 1,225 の計画意図と参照計画を提供します。
総合的な評価によると、現在の言語エージェントはまだこのような複雑な計画タスクを処理できず、GPT-4 ですら成功率は 0.6% にすぎません。
言語エージェントは、タスクを遂行し続けること、適切なツールを使用して情報を収集すること、または複数の制約を追跡することに苦労しています。
しかし、言語エージェントがこのような複雑な問題に取り組む可能性があるだけで、それ自体が重要な進歩であることに注意してください。
TravelPlanner は、将来の言語エージェントに、挑戦的だが有意義なテストベッドを提供します。

要約(オリジナル)

Planning has been part of the core pursuit for artificial intelligence since its conception, but earlier AI agents mostly focused on constrained settings because many of the cognitive substrates necessary for human-level planning have been lacking. Recently, language agents powered by large language models (LLMs) have shown interesting capabilities such as tool use and reasoning. Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents? To advance this investigation, we propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario. It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans. Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%. Language agents struggle to stay on task, use the right tools to collect information, or keep track of multiple constraints. However, we note that the mere possibility for language agents to tackle such a complex problem is in itself non-trivial progress. TravelPlanner provides a challenging yet meaningful testbed for future language agents.

arxiv情報

著者	Jian Xie,Kai Zhang,Jiangjie Chen,Tinghui Zhu,Renze Lou,Yuandong Tian,Yanghua Xiao,Yu Su
発行日	2024-10-23 15:02:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー