Guiding Reinforcement Learning with Incomplete System Dynamics

要約

モデルフリー強化学習 (RL) は本質的に事後的な手法であり、システムに関する事前知識がない状態で開始され、学習は完全に試行錯誤に依存するという前提の下で動作します。
このアプローチは、サンプル効率の低さ、一般化、学習を効果的に導くための適切に設計された報酬関数の必要性など、いくつかの課題に直面しています。
一方、完全なシステムダイナミクスに基づくコントローラーはデータを必要としません。
このペーパーでは、完全なコントローラー設計に必要なモデル情報が不足している中間の状況について説明しますが、モデルフリーのアプローチが最良のアプローチではないことを示唆するのに十分な情報があります。
システムダイナミクスに関する既知の情報と未知の情報を慎重に切り離すことで、部分モデルによってガイドされる組み込みコントローラーが得られ、RL 強化アプローチの学習効率が向上します。
モジュール設計により、主流の RL アルゴリズムを導入してポリシーを改良することができます。
シミュレーション結果は、私たちの方法が連続制御タスクにおける標準的な RL 方法と比較してサンプル効率を大幅に向上させ、また従来の制御アプローチよりも向上したパフォーマンスを提供することを示しています。
実際の地上車両での実験では、一般化やロバスト性など、私たちの方法のパフォーマンスも検証されます。

要約(オリジナル)

Model-free reinforcement learning (RL) is inherently a reactive method, operating under the assumption that it starts with no prior knowledge of the system and entirely depends on trial-and-error for learning. This approach faces several challenges, such as poor sample efficiency, generalization, and the need for well-designed reward functions to guide learning effectively. On the other hand, controllers based on complete system dynamics do not require data. This paper addresses the intermediate situation where there is not enough model information for complete controller design, but there is enough to suggest that a model-free approach is not the best approach either. By carefully decoupling known and unknown information about the system dynamics, we obtain an embedded controller guided by our partial model and thus improve the learning efficiency of an RL-enhanced approach. A modular design allows us to deploy mainstream RL algorithms to refine the policy. Simulation results show that our method significantly improves sample efficiency compared with standard RL methods on continuous control tasks, and also offers enhanced performance over traditional control approaches. Experiments on a real ground vehicle also validate the performance of our method, including generalization and robustness.

arxiv情報

著者	Shuyuan Wang,Jingliang Duan,Nathan P. Lawrence,Philip D. Loewen,Michael G. Forbes,R. Bhushan Gopaluni,Lixian Zhang
発行日	2024-10-22 08:48:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Guiding Reinforcement Learning with Incomplete System Dynamics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー