Neural Optimal Control using Learned System Dynamics

要約

未知のダイナミクスを持つシステムの制御則を生成する問題を研究します。
私たちのアプローチは、コントローラと値関数をニューラルネットワークで表現し、ハミルトン-ヤコビ-ベルマン (HJB) 方程式から適応した損失関数を使用してそれらをトレーニングすることです。
既知のダイナミクスモデルがない場合、私たちの方法は、まず、オフラインプロセスでシステムと対話することによって収集されたデータから状態遷移を学習します。
学習した遷移関数は HJB 方程式に統合され、フィードバックループでコントローラーによって生成される制御信号を順方向にシミュレートするために使用されます。
単一の初期状態に対してコントローラーを最適化する軌道最適化手法とは対照的に、私たちのコントローラーは、状態空間の大部分から初期状態に対して最適に近い制御信号を生成できます。
最近のモデルベースの強化学習アルゴリズムと比較して、私たちの方法はサンプル効率が高く、トレーニングが桁違いに高速であることを示しています。
12 の状態変数を持つクワッドローターの制御を含む、多くのタスクでこの方法を示します。

要約(オリジナル)

We study the problem of generating control laws for systems with unknown dynamics. Our approach is to represent the controller and the value function with neural networks, and to train them using loss functions adapted from the Hamilton-Jacobi-Bellman (HJB) equations. In the absence of a known dynamics model, our method first learns the state transitions from data collected by interacting with the system in an offline process. The learned transition function is then integrated to the HJB equations and used to forward simulate the control signals produced by our controller in a feedback loop. In contrast to trajectory optimization methods that optimize the controller for a single initial state, our controller can generate near-optimal control signals for initial states from a large portion of the state space. Compared to recent model-based reinforcement learning algorithms, we show that our method is more sample efficient and trains faster by an order of magnitude. We demonstrate our method in a number of tasks, including the control of a quadrotor with 12 state variables.

arxiv情報

著者	Selim Engin,Volkan Isler
発行日	2023-02-20 09:07:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Neural Optimal Control using Learned System Dynamics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー