Probabilistic Safeguard for Reinforcement Learning Using Safety Index Guided Gaussian Process Models

要約

安全性は、強化学習（RL）を物理世界に適用する際の最大の懸念事項の1つである。その核となる部分において、RLエージェントがホワイトボックスやブラックボックスのダイナミクスモデルなしにハードな状態制約を持続的に満たすことを保証することは困難である。本論文では、ダイナミクスがガウス過程として学習されるエージェントを保護するための、統合的なモデル学習と安全制御の枠組みを提案する。提案する理論は、(i)安全要求を最もよく達成するモデル学習のためのオフラインデータセットを構築する新しい方法、(ii)安全制御の存在を保証するための安全インデックスのパラメータ化ルール、(iii)前述のデータセットを用いてモデルを学習した場合の確率的前方不変性の観点からの安全保証を提供する。シミュレーションの結果、本フレームワークは様々な連続制御タスクにおいて安全違反がほぼゼロであることを保証する。

要約(オリジナル)

Safety is one of the biggest concerns to applying reinforcement learning (RL) to the physical world. In its core part, it is challenging to ensure RL agents persistently satisfy a hard state constraint without white-box or black-box dynamics models. This paper presents an integrated model learning and safe control framework to safeguard any agent, where its dynamics are learned as Gaussian processes. The proposed theory provides (i) a novel method to construct an offline dataset for model learning that best achieves safety requirements; (ii) a parameterization rule for safety index to ensure the existence of safe control; (iii) a safety guarantee in terms of probabilistic forward invariance when the model is learned using the aforementioned dataset. Simulation results show that our framework guarantees almost zero safety violation on various continuous control tasks.

arxiv情報

著者	Weiye Zhao,Tairan He,Changliu Liu
発行日	2023-06-30 19:10:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Probabilistic Safeguard for Reinforcement Learning Using Safety Index Guided Gaussian Process Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー