Leveraging Demonstrations with Latent Space Priors

要約

デモンストレーションは、関連する状態またはアクション空間領域への洞察を提供し、強化学習エージェントの効率と実用性を高める大きな可能性を秘めています。
この作業では、スキル学習とシーケンスモデリングを組み合わせることで、デモンストレーションデータセットを活用することを提案します。
学習された共同潜在空間から始めて、デモンストレーションシーケンスの生成モデルとそれに付随する低レベルポリシーを個別にトレーニングします。
シーケンスモデルは、もっともらしいデモンストレーション動作の前に潜在的な空間を形成し、高レベルのポリシーの学習を加速します。
状態のみのモーションキャプチャのデモンストレーションからそのような事前確率を取得する方法を示し、それらを転送タスクのポリシー学習に統合するためのいくつかの方法を探ります。
私たちの実験結果は、潜在空間事前確率が学習速度と最終的なパフォーマンスに大きな利益をもたらすことを確認しています。
複雑なシミュレートされたヒューマノイドを使用した一連の挑戦的な報酬が少ない環境と、ナビゲーションとオブジェクト操作のためのオフライン RL ベンチマークでアプローチをベンチマークします。
ビデオ、ソースコード、事前トレーニング済みのモデルは、対応するプロジェクトの Web サイト (https://facebookresearch.github.io/latent-space-priors ) で入手できます。

要約(オリジナル)

Demonstrations provide insight into relevant state or action space regions, bearing great potential to boost the efficiency and practicality of reinforcement learning agents. In this work, we propose to leverage demonstration datasets by combining skill learning and sequence modeling. Starting with a learned joint latent space, we separately train a generative model of demonstration sequences and an accompanying low-level policy. The sequence model forms a latent space prior over plausible demonstration behaviors to accelerate learning of high-level policies. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning on transfer tasks. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance. We benchmark our approach on a set of challenging sparse-reward environments with a complex, simulated humanoid, and on offline RL benchmarks for navigation and object manipulation. Videos, source code and pre-trained models are available at the corresponding project website at https://facebookresearch.github.io/latent-space-priors .

arxiv情報

著者	Jonas Gehring,Deepak Gopinath,Jungdam Won,Andreas Krause,Gabriel Synnaeve,Nicolas Usunier
発行日	2023-03-13 22:55:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Leveraging Demonstrations with Latent Space Priors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー