CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with Trajectory Optimization

要約

軌道最適化 (TO) と強化学習 (RL) は、最適な制御問題を解決するための強力で補完的なツールです。
一方で、TO は局所最適解を効率的に計算できますが、問題が凸でない場合、局所最小値に行き詰まる傾向があります。
一方、RL は通常、非凸性に対する感度は低くなりますが、はるかに多くの計算量が必要になります。
最近、私たちは CACTO (Continuous Actor-Critic with Trajectory Optimization) を提案しました。これは、TO を使用してアクター – クリティカル RL アルゴリズムの探索をガイドするアルゴリズムです。
次に、アクターによってエンコードされたポリシーを使用して TO がウォームスタートされ、TO と RL の間のループが閉じられます。
この研究では、ソボレフ学習のアイデアを活用した CACTO の拡張を紹介します。
クリティカルネットワークのトレーニングをより高速かつデータ効率的にするために、差分動的計画法アルゴリズムのバックワードパスを介して計算された値関数の勾配でネットワークを強化します。
私たちの結果は、新しいアルゴリズムが元の CACTO より効率的であり、TO エピソードの数を 3 ～ 10 の範囲で削減し、結果として計算時間を短縮することを示しています。
さらに、CACTO-SL が TO がより良い最小値を見つけ、より一貫した結果を生成するのに役立つことを示します。

要約(オリジナル)

Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present an extension of CACTO exploiting the idea of Sobolev learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.

arxiv情報

著者	Elisa Alboni,Gianluigi Grandesso,Gastone Pietro Rosati Papini,Justin Carpentier,Andrea Del Prete
発行日	2023-12-17 09:44:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with Trajectory Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー