Deploying and Evaluating LLMs to Program Service Mobile Robots

要約

大規模言語モデル (LLM) の最近の進歩により、自然言語からロボットプログラムを生成するために LLM を使用することへの関心が高まり、有望な初期結果が得られています。
私たちは、LLM を使用して、移動性、知覚、人間との対話スキルを活用したサービスモバイルロボットのプログラムを生成する方法を調査しています。成功には、アクションの正確な順序付けと順序付けが重要です。
私たちは、自然言語からサービスモバイルロボットをプログラムするためのオープンソースのロボットに依存しないツールである CodeBotler と、サービスロボットのタスクを完了するためのプログラムを生成する LLM の機能を評価するためのベンチマークである RoboEval に貢献しています。
CodeBotler は、Python に埋め込まれたドメイン固有言語 (eDSL) を使用した LLM の数ショットプロンプトを介してプログラム生成を実行し、スキル抽象化を活用して、生成されたプログラムを汎用モバイルロボットに展開します。
RoboEval は、複数の初期状態から始まる実行トレースをチェックし、トレースが各タスクの正しさをエンコードする時相論理プロパティを満たしているかどうかをチェックすることによって、生成されたプログラムの正しさを評価します。
RoboEval には、プログラム生成の堅牢性をテストするためのタスクごとに複数のプロンプトも含まれています。
私たちは、RoboEval ベンチマークを使用していくつかの人気のある最先端の LLM を評価し、障害モードの徹底的な分析を実行します。その結果、ロボットプログラムを生成する際の LLM の一般的な落とし穴を強調する分類法が得られます。
コードとベンチマークは https://amrl.cs.utexas.edu/codebotler/ でリリースされています。

要約(オリジナル)

Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contribute CodeBotler, an open-source robot-agnostic tool to program service mobile robots from natural language, and RoboEval, a benchmark for evaluating LLMs’ capabilities of generating programs to complete service robot tasks. CodeBotler performs program generation via few-shot prompting of LLMs with an embedded domain-specific language (eDSL) in Python, and leverages skill abstractions to deploy generated programs on any general-purpose mobile robot. RoboEval evaluates the correctness of generated programs by checking execution traces starting with multiple initial states, and checking whether the traces satisfy temporal logic properties that encode correctness for each task. RoboEval also includes multiple prompts per task to test for the robustness of program generation. We evaluate several popular state-of-the-art LLMs with the RoboEval benchmark, and perform a thorough analysis of the modes of failures, resulting in a taxonomy that highlights common pitfalls of LLMs at generating robot programs. We release our code and benchmark at https://amrl.cs.utexas.edu/codebotler/.

arxiv情報

著者	Zichao Hu,Francesca Lucchetti,Claire Schlesinger,Yash Saxena,Anders Freeman,Sadanand Modak,Arjun Guha,Joydeep Biswas
発行日	2023-11-18 23:24:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deploying and Evaluating LLMs to Program Service Mobile Robots

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー