Grounding Language with Visual Affordances over Unstructured Data

要約

最近の研究では、Large Language Models (LLM) をさまざまなロボットスキルのグラウンド自然言語に適用できることが示されています。
ただし、実際には、マルチタスクの言語条件付きロボットスキルを学習するには、通常、環境をリセットしたり、現在のポリシーを修正したりするために、大規模なデータ収集と頻繁な人間の介入が必要です。
この作業では、自己教師ありの視覚言語アフォーダンスモデルを活用することで、実世界の非構造化、オフライン、およびリセット不要のデータから、汎用の言語条件付きロボットスキルを効率的に学習するための新しいアプローチを提案します。
言語を含む全データの 1% として。
シミュレートされたロボットタスクと現実世界のロボットタスクの両方で広範な実験で手法を評価し、困難な CALVIN ベンチマークで最先端のパフォーマンスを達成し、現実世界で単一のポリシーを使用して 25 を超える視覚運動操作タスクを学習しました。
LLM と組み合わせて、数ショットプロンプトを介して抽象的な自然言語命令をサブゴールに分解すると、私たちの方法は、実世界で長期にわたる多層タスクを完了することができる一方で、
以前のアプローチ。
コードとビデオは、http://hulc2.cs.uni-freiburg.de で入手できます。

要約(オリジナル)

Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills. However, in practice, learning multi-task, language-conditioned robotic skills typically requires large-scale data collection and frequent human intervention to reset the environment or help correcting the current policies. In this work, we propose a novel approach to efficiently learn general-purpose language-conditioned robot skills from unstructured, offline and reset-free data in the real world by exploiting a self-supervised visuo-lingual affordance model, which requires annotating as little as 1% of the total data with language. We evaluate our method in extensive experiments both in simulated and real-world robotic tasks, achieving state-of-the-art performance on the challenging CALVIN benchmark and learning over 25 distinct visuomotor manipulation tasks with a single policy in the real world. We find that when paired with LLMs to break down abstract natural language instructions into subgoals via few-shot prompting, our method is capable of completing long-horizon, multi-tier tasks in the real world, while requiring an order of magnitude less data than previous approaches. Code and videos are available at http://hulc2.cs.uni-freiburg.de

arxiv情報

著者	Oier Mees,Jessica Borja-Diaz,Wolfram Burgard
発行日	2023-03-08 11:00:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Grounding Language with Visual Affordances over Unstructured Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー