Policy Contrastive Imitation Learning

要約

敵対的模倣学習 (AIL) は、最近大きな成功を収めている人気のある方法です。
ただし、AIL のパフォーマンスは、より困難なタスクではまだ満足のいくものではありません。
主な理由の 1 つは、AIL 識別子の表現の品質が低いことによるものであることがわかりました。
AIL 弁別器はバイナリ分類を介してトレーニングされているため、ポリシーと専門家を意味のある方法で区別する必要はなく、結果として得られる報酬も意味のあるものではない可能性があります。
我々は、この問題を解決するために、Policy Contrastive Imitation Learning (PCIL) と呼ばれる新しい手法を提案します。
PCIL は、さまざまなポリシーに基づいて対比表現空間を学習し、滑らかなコサイン類似度に基づく報酬を生成します。
私たちが提案する表現学習目標は、AIL 目標のより強力なバージョンとみなすことができ、エージェントとポリシーの間でより有意義な比較を提供します。
理論的な観点から、徒弟制度学習フレームワークを使用した方法の妥当性を示します。
さらに、DeepMind Control スイートに対する当社の実証的評価は、PCIL が最先端のパフォーマンスを達成できることを実証しています。
最後に、定性的な結果は、PCIL が模倣学習のためのよりスムーズでより意味のある表現空間を構築することを示唆しています。

要約(オリジナル)

Adversarial imitation learning (AIL) is a popular method that has recently achieved much success. However, the performance of AIL is still unsatisfactory on the more challenging tasks. We find that one of the major reasons is due to the low quality of AIL discriminator representation. Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue. PCIL learns a contrastive representation space by anchoring on different policies and generates a smooth cosine-similarity-based reward. Our proposed representation learning objective can be viewed as a stronger version of the AIL objective and provide a more meaningful comparison between the agent and the policy. From a theoretical perspective, we show the validity of our method using the apprenticeship learning framework. Furthermore, our empirical evaluation on the DeepMind Control suite demonstrates that PCIL can achieve state-of-the-art performance. Finally, qualitative results suggest that PCIL builds a smoother and more meaningful representation space for imitation learning.

arxiv情報

著者	Jialei Huang,Zhaoheng Yin,Yingdong Hu,Yang Gao
発行日	2023-07-06 07:52:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Policy Contrastive Imitation Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー