Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

要約

ジェネラリストロボットが本当に構造化されていない環境で動作する場合、新しいオブジェクトやシナリオを認識し、推論できる必要があります。
このようなオブジェクトやシナリオは、ロボット自体のトレーニングデータには存在しない可能性があります。
我々は、画像編集拡散モデルを利用して、低レベルのコントローラが達成できる中間のサブ目標を提案することで高レベルのプランナーとして機能する手法である SuSIE を提案します。
具体的には、人間のビデオとロボットのロールアウトの両方で構成されるビデオデータに対して InstructPix2Pix を微調整し、ロボットの現在の観察と言語コマンドを考慮して、仮説的な将来の「サブゴール」観察を出力します。
また、ロボットデータを使用して、前述の低レベルコントローラーとして機能する低レベルの目標条件付きポリシーをトレーニングします。
高レベルのサブ目標予測は、インターネット規模の事前トレーニングと視覚的理解を利用して、低レベルの目標条件付きポリシーをガイドでき、従来の言語条件付きポリシーよりも大幅に優れた一般化と精度を達成できることがわかりました。
私たちは CALVIN ベンチマークで最先端の結果を達成し、また、特権情報にアクセスできる、または桁違いに多くの計算データとトレーニングデータを利用する強力なベースラインを上回り、現実世界の操作タスクで堅牢な一般化を実証しています。
プロジェクトの Web サイトは http://rail-berkeley.github.io/susie にあります。

要約(オリジナル)

If generalist robots are to operate in truly unstructured environments, they need to be able to recognize and reason about novel objects and scenarios. Such objects and scenarios might not be present in the robot’s own training data. We propose SuSIE, a method that leverages an image-editing diffusion model to act as a high-level planner by proposing intermediate subgoals that a low-level controller can accomplish. Specifically, we finetune InstructPix2Pix on video data, consisting of both human videos and robot rollouts, such that it outputs hypothetical future ‘subgoal’ observations given the robot’s current observation and a language command. We also use the robot data to train a low-level goal-conditioned policy to act as the aforementioned low-level controller. We find that the high-level subgoal predictions can utilize Internet-scale pretraining and visual understanding to guide the low-level goal-conditioned policy, achieving significantly better generalization and precision than conventional language-conditioned policies. We achieve state-of-the-art results on the CALVIN benchmark, and also demonstrate robust generalization on real-world manipulation tasks, beating strong baselines that have access to privileged information or that utilize orders of magnitude more compute and training data. The project website can be found at http://rail-berkeley.github.io/susie .

arxiv情報

著者	Kevin Black,Mitsuhiko Nakamoto,Pranav Atreya,Homer Walke,Chelsea Finn,Aviral Kumar,Sergey Levine
発行日	2023-10-16 17:57:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー