Masked World Models for Visual Control

要約

視覚モデルベースの強化学習 (RL) には、視覚観察からのサンプル効率的なロボット学習を可能にする可能性があります。
しかし、現在のアプローチは通常、視覚表現とダイナミクスの両方を学習するために単一のモデルをエンドツーエンドでトレーニングするため、ロボットと小さなオブジェクト間の相互作用を正確にモデル化することが困難になります。
この研究では、視覚表現学習とダイナミクス学習を分離する視覚モデルベースの RL フレームワークを紹介します。
具体的には、畳み込み層とビジョントランスフォーマー (ViT) を使用してオートエンコーダーをトレーニングし、マスクされた畳み込み特徴が与えられたピクセルを再構築し、オートエンコーダーからの表現に作用する潜在ダイナミクスモデルを学習します。
さらに、タスク関連情報をエンコードするために、オートエンコーダーに補助的な報酬予測目標を導入します。
私たちは、環境インタラクションから収集されたオンラインサンプルを使用して、オートエンコーダーとダイナミクスモデルの両方を継続的に更新します。
私たちは、メタワールドと RLBench からのさまざまな視覚ロボット操作タスクで、デカップリングアプローチが最先端のパフォーマンスを達成することを実証します。たとえば、メタワールドからの 50 の視覚ロボット操作タスクでは 81.7% の成功率を達成しましたが、
ベースラインは 67.9% を達成しました。
コードはプロジェクトの Web サイト (https://sites.google.com/view/mwm-rl) で入手できます。

要約(オリジナル)

Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient robot learning from visual observations. Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects. In this work, we introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning. Specifically, we train an autoencoder with convolutional layers and vision transformers (ViT) to reconstruct pixels given masked convolutional features, and learn a latent dynamics model that operates on the representations from the autoencoder. Moreover, to encode task-relevant information, we introduce an auxiliary reward prediction objective for the autoencoder. We continually update both autoencoder and dynamics model using online samples collected from environment interaction. We demonstrate that our decoupling approach achieves state-of-the-art performance on a variety of visual robotic tasks from Meta-world and RLBench, e.g., we achieve 81.7% success rate on 50 visual robotic manipulation tasks from Meta-world, while the baseline achieves 67.9%. Code is available on the project website: https://sites.google.com/view/mwm-rl.

arxiv情報

著者	Younggyo Seo,Danijar Hafner,Hao Liu,Fangchen Liu,Stephen James,Kimin Lee,Pieter Abbeel
発行日	2023-05-27 09:29:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Masked World Models for Visual Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー