Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Optimal Feedback Control

要約

この研究は、最適な制御問題を解決するためにニューラルネットワークベースのフィードバックコントローラーを効率的に解決することに関係しています。
まず、オフラインの教師あり学習とオンラインの直接ポリシー最適化という 2 つの一般的なアプローチの比較研究を行います。
教師あり学習アプローチのトレーニング部分は比較的簡単ですが、この方法の成功は、開ループ最適制御ソルバーによって生成される最適制御データセットに大きく依存します。
対照的に、直接ポリシー最適化では、事前計算を必要とせずに、最適制御問題を直接最適化問題に変換しますが、問題が複雑な場合、ダイナミクス関連の目的の最適化が困難になる可能性があります。
私たちの結果は、最適性とトレーニング時間の両方の点で、オフラインの教師あり学習の優位性を強調しています。
データセットと最適化という主な課題を 2 つのアプローチでそれぞれ克服するために、私たちはそれらを補完し、最適なフィードバック制御のための統合トレーニングパラダイムとして事前トレーニングおよび微調整戦略を提案します。これにより、パフォーマンスとロバスト性がさらに大幅に向上します。
私たちのコードは https://github.com/yzhao98/DeepOptimalControl からアクセスできます。

要約(オリジナル)

This work is concerned with solving neural network-based feedback controllers efficiently for optimal control problems. We first conduct a comparative study of two prevalent approaches: offline supervised learning and online direct policy optimization. Albeit the training part of the supervised learning approach is relatively easy, the success of the method heavily depends on the optimal control dataset generated by open-loop optimal control solvers. In contrast, direct policy optimization turns the optimal control problem into an optimization problem directly without any requirement of pre-computing, but the dynamics-related objective can be hard to optimize when the problem is complicated. Our results underscore the superiority of offline supervised learning in terms of both optimality and training time. To overcome the main challenges, dataset and optimization, in the two approaches respectively, we complement them and propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control, which further improves the performance and robustness significantly. Our code is accessible at https://github.com/yzhao98/DeepOptimalControl.

arxiv情報

著者	Yue Zhao,Jiequn Han
発行日	2024-04-09 17:45:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Optimal Feedback Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー