IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic

要約

このペーパーでは、模倣学習のためのSOARフレームワークを紹介します。
SOARは、コストとポリシーの更新を交互にするプライマルデュアルスタイルアルゴリズムを使用して、専門家のデモンストレーションからポリシーを学習するアルゴリズムテンプレートです。
ポリシーの更新内で、Soar Frameworkは、複数の批評家と俳優批評家の方法を使用して、批評家の不確実性を推定し、探索を推進するための楽観的な批評家の基本を構築します。
表形式の設定にインスタンス化されると、$ \ epsilon $の最もよく知られている結果と一致する保証を含む証明可能なアルゴリズムが得られます。
実際には、SOARテンプレートは、いくつかのMujoco環境でF-IRL、ML-IRL、CSILなどのソフト俳優評論家に基づいた模倣学習アルゴリズムのパフォーマンスを一貫して向上させることが示されています。
全体として、SOARのおかげで、同じパフォーマンスを達成するために必要なエピソードの数が半分に削減されます。

要約(オリジナル)

This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template that learns a policy from expert demonstrations with a primal dual style algorithm that alternates cost and policy updates. Within the policy updates, the SOAR framework uses an actor critic method with multiple critics to estimate the critic uncertainty and build an optimistic critic fundamental to drive exploration. When instantiated in the tabular setting, we get a provable algorithm with guarantees that matches the best known results in $\epsilon$. Practically, the SOAR template is shown to boost consistently the performance of imitation learning algorithms based on Soft Actor Critic such as f-IRL, ML-IRL and CSIL in several MuJoCo environments. Overall, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by half.

arxiv情報

著者	Stefano Viel,Luca Viano,Volkan Cevher
発行日	2025-05-30 16:16:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー