Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

要約

RT-1 のような大規模な取り組みや、Open-X-Embodiment などの広範なコミュニティの取り組みは、ロボットデモンストレーションデータの規模の拡大に貢献してきました。
ただし、ロボットのデモンストレーションデータの質、量、多様性を改善する機会はまだあります。
ビジョン言語モデルはデモンストレーションデータを自動的に生成することが示されていますが、その有用性は特権的な状態情報を持つ環境に限定されており、手作業で設計されたスキルが必要で、少数のオブジェクトインスタンスとの対話に限定されています。
我々は、現実世界のロボット操作のためのスケーラブルな自動生成手法である Manipulate-Anything を提案します。
以前の研究とは異なり、私たちのメソッドは、特権状態情報や手作業で設計されたスキルなしで実世界の環境で動作し、あらゆる静的オブジェクトを操作できます。
2 つの設定を使用してメソッドを評価します。
まず、Manipulate-Anything は、5 つの現実世界タスクと 12 のシミュレーションタスクすべての軌道を生成することに成功し、VoxPoser などの既存の手法を大幅に上回りました。
第 2 に、Manipulate-Anything のデモンストレーションでは、人間のデモンストレーションや、VoxPoser や Code-As-Policies によって生成されたデータを使用してトレーニングするよりも、より堅牢な動作クローンポリシーをトレーニングできます。
私たちは \methodLong\ がロボット工学用のデータ生成とゼロショット設定での新しいタスクの解決の両方にスケーラブルな方法であると信じています。

要約(オリジナル)

Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to environments with privileged state information, they require hand-designed skills, and are limited to interactions with few object instances. We propose Manipulate-Anything, a scalable automated generation method for real-world robotic manipulation. Unlike prior work, our method can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object. We evaluate our method using two setups. First, Manipulate-Anything successfully generates trajectories for all 5 real-world and 12 simulation tasks, significantly outperforming existing methods like VoxPoser. Second, Manipulate-Anything’s demonstrations can train more robust behavior cloning policies than training with human demonstrations, or from data generated by VoxPoser and Code-As-Policies. We believe \methodLong\ can be the scalable method for both generating data for robotics and solving novel tasks in a zero-shot setting.

arxiv情報

著者	Jiafei Duan,Wentao Yuan,Wilbert Pumacay,Yi Ru Wang,Kiana Ehsani,Dieter Fox,Ranjay Krishna
発行日	2024-06-27 06:12:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー