Language Models as Zero-Shot Trajectory Generators

要約

大規模言語モデル (LLM) は、低レベルのスキルを選択してアクセスできる場合、ロボットの高レベルのプランナーとして最近有望であることが示されています。
ただし、LLM は低レベルの軌道自体に使用できる十分な知識を持っていないことが想定されます。
この研究では、この仮定に徹底的に取り組み、物体検出およびセグメンテーションビジョンモデルのみへのアクセスが与えられた場合に、LLM (GPT-4) が操作タスクのエンドエフェクターポーズの密なシーケンスを直接予測できるかどうかを調査します。
私たちは、コンテキスト内サンプル、モーションプリミティブ、外部軌道オプティマイザーを使用せずに、タスクに依存しない単一のプロンプトを設計しました。
次に、「ボトルのキャップを開ける」や「スポンジで皿を拭く」など、30 の実世界の言語ベースのタスクでどれだけうまく実行できるかを調査し、このプロンプトのどのデザイン選択が最も重要であるかを調査しました。
私たちの結論は、ロボット工学における LLM の想定限界を引き上げるものであり、LLM がさまざまな一般的なタスクに十分な低レベルのロボット制御を実際に理解していること、さらに障害を検出して再実行できることを初めて明らかにしました。
それに応じて軌道を計画します。
ビデオ、プロンプト、コードは、https://www.robot-learning.uk/ language-models-trajectory-generators から入手できます。

要約(オリジナル)

Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation tasks, when given access to only object detection and segmentation vision models. We designed a single, task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers. Then we studied how well it can perform across 30 real-world language-based tasks, such as ‘open the bottle cap’ and ‘wipe the plate with the sponge’, and we investigated which design choices in this prompt are the most important. Our conclusions raise the assumed limit of LLMs for robotics, and we reveal for the first time that LLMs do indeed possess an understanding of low-level robot control sufficient for a range of common tasks, and that they can additionally detect failures and then re-plan trajectories accordingly. Videos, prompts, and code are available at: https://www.robot-learning.uk/language-models-trajectory-generators.

arxiv情報

著者	Teyun Kwon,Norman Di Palo,Edward Johns
発行日	2024-06-17 23:57:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language Models as Zero-Shot Trajectory Generators

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー