GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs

要約

今日のロボットシミュレーションは、多様なシミュレーションタスクとシーンを作成するために必要な人的努力のために、スケールアップが困難なままです。また、シミュレーションで訓練されたポリシーも、多くのシミュレーションからリアルへの手法が単一のタスクに焦点を当てるため、スケーラビリティの問題に直面している。これらの課題に対処するため、本研究では、マルチモーダルおよび推論機能を備えた符号化LLMを活用し、多関節オブジェクトを含むロングホライズンタスクを含む複雑でリアルなシミュレーションタスクを作成するスケーラブルなフレームワークであるGenSim2を提案する。これらのタスクの実証データを大規模に自動生成するために、オブジェクトカテゴリ内で汎化するプランニングとRLソルバを提案する。このパイプラインは、最大100の多関節タスクと200のオブジェクトのデータを生成することができ、必要な人間の労力を削減することができる。このようなデータを利用するために、我々は、生成されたデモンストレーションから学習し、強力なシムからリアルへのゼロショット転送を示す、固有知覚点群変換器(PPT)と呼ばれる、効果的なマルチタスク言語条件付きポリシーアーキテクチャを提案する。提案するパイプラインと政策アーキテクチャを組み合わせることで、GenSim2の有望な利用法を示す。生成されたデータをゼロショット転送に利用したり、実世界で収集されたデータと協調学習することで、限られた実データのみで学習する場合と比較して、政策性能を20%向上させることができる。

要約(オリジナル)

Robotic simulation today remains challenging to scale up due to the human efforts required to create diverse simulation tasks and scenes. Simulation-trained policies also face scalability issues as many sim-to-real methods focus on a single task. To address these challenges, this work proposes GenSim2, a scalable framework that leverages coding LLMs with multi-modal and reasoning capabilities for complex and realistic simulation task creation, including long-horizon tasks with articulated objects. To automatically generate demonstration data for these tasks at scale, we propose planning and RL solvers that generalize within object categories. The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts. To utilize such data, we propose an effective multi-task language-conditioned policy architecture, dubbed proprioceptive point-cloud transformer (PPT), that learns from the generated demonstrations and exhibits strong sim-to-real zero-shot transfer. Combining the proposed pipeline and the policy architecture, we show a promising usage of GenSim2 that the generated data can be used for zero-shot transfer or co-train with real-world collected data, which enhances the policy performance by 20% compared with training exclusively on limited real data.

arxiv情報

著者	Pu Hua,Minghuan Liu,Annabella Macaluso,Yunfeng Lin,Weinan Zhang,Huazhe Xu,Lirui Wang
発行日	2024-10-04 17:51:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー