Policy Gradient Methods in the Presence of Symmetries and State Abstractions

要約

高次元の複雑な問題に関する強化学習 (RL) は、効率と一般化の向上のために抽象化に依存します。
この論文では、連続制御設定における抽象化を研究し、マルコフ決定過程 (MDP) 準同型性の定義を連続状態および動作空間の設定に拡張します。
確率論的政策と決定論的政策の両方について、抽象 MDP に関する政策勾配定理を導出します。
ポリシー勾配の結果により、環境のほぼ対称性を利用してポリシーを最適化することができます。
これらの定理に基づいて、緩いバイシミュレーションメトリックを使用して、ポリシーと MDP 準同型マップを同時に学習できるアクタークリティカルアルゴリズムのファミリーを提案します。
最後に、連続的な対称性を持つ一連の環境を導入して、そのような対称性が存在する場合のアクション抽象化に対するアルゴリズムの能力をさらに実証します。
私たちの環境だけでなく、DeepMind Control Suite からの困難な視覚制御タスクでも、私たちの方法の有効性を実証します。
表現学習に MDP 準同型性を利用する私たちの方法の能力はパフォーマンスの向上につながり、潜在空間の視覚化は学習された抽象化の構造を明確に示します。

要約(オリジナル)

Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In this paper, we study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both stochastic and deterministic policies. Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization. Based on these theorems, we propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. Finally, we introduce a series of environments with continuous symmetries to further demonstrate the ability of our algorithm for action abstraction in the presence of such symmetries. We demonstrate the effectiveness of our method on our environments, as well as on challenging visual control tasks from the DeepMind Control Suite. Our method’s ability to utilize MDP homomorphisms for representation learning leads to improved performance, and the visualizations of the latent space clearly demonstrate the structure of the learned abstraction.

arxiv情報

著者	Prakash Panangaden,Sahand Rezaei-Shoshtari,Rosie Zhao,David Meger,Doina Precup
発行日	2024-03-07 17:26:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー