Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

要約

最新の機械学習では、多くの場合、モデルはさまざまな方法でトレーニングデータを適合させることができますが、その中には目に見えない (テスト) データに対してうまく機能するものもあれば、そうでないものもあります。
注目すべきことに、このような場合、勾配降下法は、目に見えないデータに対して優れたパフォーマンスをもたらす暗黙のバイアスを示すことがよくあります。
この暗黙的なバイアスは、教師あり学習では広く研究されていますが、最適制御 (強化学習) ではほとんど理解されていません。
そこでは、勾配降下法を介してシステムに適用されるコントローラーを学習することはポリシー勾配として知られており、最も重要な問題は、学習されたコントローラーが目に見えない初期状態をどの程度外挿するかということです。
この論文は、目に見えない初期状態への外挿という観点から、政策勾配の暗黙的なバイアスを理論的に研究します。
基本的な線形二次レギュレーター (LQR) 問題に焦点を当て、外挿の範囲は、トレーニングに含まれる初期状態から開始するときにシステムによって引き起こされる探索の程度に依存することを確立します。
実験は私たちの理論を裏付け、システムが非線形でコントローラーがニューラルネットワークであるという LQR を超えた問題に関する結論を実証します。
私たちは、訓練の対象となる初期状態を情報に基づいて選択する方法を開発することで、現実世界の最適制御が大幅に改善される可能性があると仮説を立てています。

要約(オリジナル)

In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (reinforcement learning). There, learning a controller applied to a system via gradient descent is known as policy gradient, and a question of prime importance is the extent to which a learned controller extrapolates to unseen initial states. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states. Focusing on the fundamental Linear Quadratic Regulator (LQR) problem, we establish that the extent of extrapolation depends on the degree of exploration induced by the system when commencing from initial states included in training. Experiments corroborate our theory, and demonstrate its conclusions on problems beyond LQR, where systems are non-linear and controllers are neural networks. We hypothesize that real-world optimal control may be greatly improved by developing methods for informed selection of initial states to train on.

arxiv情報

著者	Noam Razin,Yotam Alexander,Edo Cohen-Karlik,Raja Giryes,Amir Globerson,Nadav Cohen
発行日	2024-02-12 18:41:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー