SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

要約

大規模言語モデル (LLM) は、さまざまなタスクのためのジェネラリスト計画エージェントの開発において目覚ましい成果を示しています。
ただし、これらの計画を広大な複数のフロア、複数の部屋の環境に根付かせることは、ロボット工学にとって大きな課題となります。
3D シーングラフ (3DSG) 表現を使用したロボット工学向けの LLM ベースの大規模タスク計画へのスケーラブルなアプローチである SayPlan を紹介します。
私たちのアプローチのスケーラビリティを確保するために、(1) 3DSG の階層的性質を利用して、LLM が完全なグラフの小さく折りたたまれた表現からタスク関連のサブグラフのセマンティック検索を実行できるようにします。
(2) 従来のパスプランナーを統合することで LLM の計画期間を短縮し、(3) シーングラフシミュレーターからのフィードバックを使用して初期計画を改良し、実行不可能なアクションを修正して計画の失敗を回避する反復再計画パイプラインを導入します。
私たちは、最大 3 つのフロア、36 の部屋、および 140 個のオブジェクトに及ぶ 2 つの大規模環境でアプローチを評価し、モバイル向けの抽象的で自然な言語の指示に基づく大規模で長期的なタスクプランを確立できることを示します。
実行するマニピュレーターロボット。

要約(オリジナル)

Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute.

arxiv情報

著者	Krishan Rana,Jesse Haviland,Sourav Garg,Jad Abou-Chakra,Ian Reid,Niko Suenderhauf
発行日	2023-07-12 12:37:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー