CACTO: Continuous Actor-Critic with Trajectory Optimization — Towards global optimality

要約

タイトル：CACTO：トラジェクトリ最適化と連続型アクター・クリティックでグローバル最適性を目指す

要約：

– 本論文は、トラジェクトリ最適化（TO）と強化学習（RL）を単一のフレームワークに組み合わせた、動的システムの連続制御のための新しいアルゴリズムを提案する。
– このアルゴリズムの動機は、TOとRLが連続型非線形システムに適用された場合に、非凸のコスト関数を最小化する際に、2つの主要な制限があることである。
– 具体的には、TOは、検索が「良い」最小値に近い位置で初期化されていない場合に、貧弱な局所最小値に詰まる可能性がある。一方、連続した状態と制御空間を扱う場合、RLのトレーニングプロセスは過剰に長く、探索戦略に強く依存する可能性がある。
– このため、我々のアルゴリズムは、TOによって導かれたRLポリシー探索によって「良い」制御ポリシーを学習し、TOの初期推測プロバイダーとして使用された場合、トラジェクトリ最適化プロセスが貧弱な局所的最適点に収束する可能性を減らすことができる。
– 本方法は、非凸障害回避を特徴とする複数の到達問題に適用され、6D状態を持つ車のモデルや3ジョイント平面マニピュレータを含む異なる動的システムで検証された。その結果、CACTOは局所最小値を回避する能力があり、Deep Deterministic Policy Gradient（DDPG）やProximal Policy Optimization（PPO）RLアルゴリズムよりも計算効率が高いことが示された。

要約(オリジナル)

This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trajectory Optimization (TO) and Reinforcement Learning (RL) in a single framework. The motivations behind this algorithm are the two main limitations of TO and RL when applied to continuous nonlinear systems to minimize a non-convex cost function. Specifically, TO can get stuck in poor local minima when the search is not initialized close to a ‘good’ minimum. On the other hand, when dealing with continuous state and control spaces, the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, our algorithm learns a ‘good’ control policy via TO-guided RL policy search that, when used as initial guess provider for TO, makes the trajectory optimization process less prone to converge to poor local optima. Our method is validated on several reaching problems featuring non-convex obstacle avoidance with different dynamical systems, including a car model with 6D state, and a 3-joint planar manipulator. Our results show the great capabilities of CACTO in escaping local minima, while being more computationally efficient than the Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) RL algorithms.

arxiv情報

著者	Gianluigi Grandesso,Elisa Alboni,Gastone P. Rosati Papini,Patrick M. Wensing,Andrea Del Prete
発行日	2023-05-08 12:48:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

CACTO: Continuous Actor-Critic with Trajectory Optimization — Towards global optimality

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー