Pre-trained Visual Dynamics Representations for Efficient Policy Learning

要約

純粋なビデオデータを使用した強化学習 (RL) の事前トレーニングは、貴重ではありますが、困難な問題です。
野生のビデオはすぐに入手でき、膨大な量の事前世界の知識が含まれていますが、アクションの注釈がないことと、下流のタスクとの共通領域のギャップにより、RL の事前トレーニングにビデオを活用することが妨げられています。
ビデオを使用した事前トレーニングの課題に対処するために、効率的なポリシー学習のためにビデオと下流タスクの間のドメインギャップを埋める事前トレーニング済みビジュアルダイナミクス表現 (PVDR) を提案します。
ビデオ予測を事前トレーニングタスクとして採用することで、Transformer ベースの条件付き変分オートエンコーダー (CVAE) を使用して視覚的なダイナミクス表現を学習します。
事前トレーニングされたビジュアルダイナミクス表現は、ビデオ内のビジュアルダイナミクスの事前知識をキャプチャします。
この抽象的な事前知識は、下流のタスクに容易に適応でき、オンライン適応を通じて実行可能なアクションと調整できます。
私たちは一連のロボット視覚制御タスクの実験を実施し、PVDR がポリシー学習を促進するためのビデオによる事前トレーニングに効果的な形式であることを検証します。

要約(オリジナル)

Pre-training for Reinforcement Learning (RL) with purely video data is a valuable yet challenging problem. Although in-the-wild videos are readily available and inhere a vast amount of prior world knowledge, the absence of action annotations and the common domain gap with downstream tasks hinder utilizing videos for RL pre-training. To address the challenge of pre-training with videos, we propose Pre-trained Visual Dynamics Representations (PVDR) to bridge the domain gap between videos and downstream tasks for efficient policy learning. By adopting video prediction as a pre-training task, we use a Transformer-based Conditional Variational Autoencoder (CVAE) to learn visual dynamics representations. The pre-trained visual dynamics representations capture the visual dynamics prior knowledge in the videos. This abstract prior knowledge can be readily adapted to downstream tasks and aligned with executable actions through online adaptation. We conduct experiments on a series of robotics visual control tasks and verify that PVDR is an effective form for pre-training with videos to promote policy learning.

arxiv情報

著者	Hao Luo,Bohan Zhou,Zongqing Lu
発行日	2024-11-05 15:18:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー