DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

要約

大規模言語モデル (LLM) は、膨大な量の意味論的な知識をエンコードし、優れた理解力と推論能力を備えています。
これまでの研究では、LLM をロボットタスクに組み込んで、実現可能で実行可能なテキスト計画を生成する方法を検討してきました。
ただし、物理世界における低レベルの実行は、環境の摂動や不完全なコントローラー設計により、高レベルのテキスト計画から逸脱する可能性があります。
この論文では、計画と実行の間の不整合の即時検出と回復を可能にする新しい言語モデル基礎フレームワーク \textbf{DoReMi} を提案します。
具体的には、LLM を活用して二重の役割を果たし、高レベルの計画を支援するだけでなく、実行中の不整合を示す可能性のある制約の生成も支援します。
次に、ビジョン言語モデル (VLM) を利用して制約違反を継続的に検出します。
当社のパイプラインは、低レベルの実行を監視し、特定の計画と実行の不整合が発生した場合にタイムリーな回復を可能にします。
ロボットアームや人型ロボットを含むさまざまな複雑なタスクに関する実験により、私たちの方法がタスクの成功率を高め、タスク完了時間を短縮できることが実証されました。
ドレミのビデオは \url{https://sites.google.com/view/doremi-paper} でご覧いただけます。

要約(オリジナル)

Large language models (LLMs) encode a vast amount of semantic knowledge and possess remarkable understanding and reasoning capabilities. Previous work has explored how to ground LLMs in robotic tasks to generate feasible and executable textual plans. However, low-level execution in the physical world may deviate from the high-level textual plan due to environmental perturbations or imperfect controller design. In this paper, we propose \textbf{DoReMi}, a novel language model grounding framework that enables immediate Detection and Recovery from Misalignments between plan and execution. Specifically, we leverage LLMs to play a dual role, aiding not only in high-level planning but also generating constraints that can indicate misalignment during execution. Then vision language models (VLMs) are utilized to detect constraint violations continuously. Our pipeline can monitor the low-level execution and enable timely recovery if certain plan-execution misalignment occurs. Experiments on various complex tasks including robot arms and humanoid robots demonstrate that our method can lead to higher task success rates and shorter task completion times. Videos of DoReMi are available at \url{https://sites.google.com/view/doremi-paper}.

arxiv情報

著者	Yanjiang Guo,Yen-Jen Wang,Lihan Zha,Zheyuan Jiang,Jianyu Chen
発行日	2023-09-30 13:40:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー