A Pontryagin Perspective on Reinforcement Learning

要約

強化学習は、従来、閉ループの方法で最適な制御問題を解決するための状態依存性ポリシーの学習に焦点を当ててきました。
この作業では、代わりに固定アクションシーケンスが学習されるオープンループ補強学習のパラダイムを紹介します。
3つの新しいアルゴリズムを提示します。1つの堅牢なモデルベースの方法と、2つのサンプル効率の高いモデルフリーメソッドです。
私たちの仕事は、Bellmanの方程式に基づいて、Bellmanの方程式に基づいて、Pontryaginの原則に基づいて、オープンループ最適制御の理論から構築されています。
収束保証を提供し、振り子のスイングアップタスクと、既存のベースラインを大幅に上回る2つの高次元ムホコのタスクと同様に、すべての方法を経験的に評価します。

要約(オリジナル)

Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman’s equation from dynamic programming, our work builds on Pontryagin’s principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, significantly outperforming existing baselines.

arxiv情報

著者	Onno Eberhard,Claire Vernade,Michael Muehlebach
発行日	2025-04-22 17:39:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Pontryagin Perspective on Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー