Sample-efficient Adversarial Imitation Learning

要約

証明によって学習を行う模倣学習は、報酬関数が事前に定義されていない一連の意思決定タスクに対して研究され、高度化されています。
ただし、模倣学習方法では、専門家の行動をうまく模倣するために、依然として多数の専門家のデモサンプルが必要です。
サンプル効率を改善するために、与えられたデータから膨大なトレーニング信号を生成できる自己教師あり表現学習を利用します。
この研究では、非画像制御タスクで、さまざまな歪みに対してロバストで時間的に予測可能な状態およびアクションの表現を学習する、自己教師あり表現ベースの敵対的模倣学習方法を提案します。
特に、表形式データの既存の自己教師あり学習方法と比較して、さまざまな歪みにロバストな状態およびアクション表現の異なる破損方法を提案します。
理論的および経験的に、サンプルの複雑さが少ない有益な機能を多様化すると、模倣学習のパフォーマンスが大幅に向上することがわかります。
提案された方法は、100 のエキスパートの状態とアクションのペアに制限された設定で、MuJoCo の既存の敵対的模倣学習方法よりも 39% の相対的な改善を示しています。
さらに、さまざまな最適性を備えたデモンストレーションを使用して包括的なアブレーションと追加の実験を行い、さまざまな要因への洞察を提供します。

要約(オリジナル)

Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert’s behavior. To improve sample efficiency, we utilize self-supervised representation learning, which can generate vast training signals from the given data. In this study, we propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations that are robust to diverse distortions and temporally predictive, on non-image control tasks. In particular, in comparison with existing self-supervised learning methods for tabular data, we propose a different corruption method for state and action representations that is robust to diverse distortions. We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors.

arxiv情報

著者	Dahuin Jung,Hyungyu Lee,Sungroh Yoon
発行日	2023-03-14 12:36:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sample-efficient Adversarial Imitation Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー