Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms

要約

現実世界の多くのシナリオで強化学習 (RL) を実際に使用するには、安全な探索が不可欠です。
この論文では、一般的な安全探査問題の統一された定式化として、一般化安全探査 (GSE) 問題を提示します。
次に、安全な探査のためのメタアルゴリズムである MASE の形で GSE 問題の解決策を提案します。MASE は、制約のない RL アルゴリズムと不確実性定量化子を組み合わせて、現在のエピソードでの安全性を保証しながら、実際の安全性違反の前に安全でない探査に適切にペナルティを与えます。
今後のエピソードで彼らを落胆させます。
MASE の利点は、適切な仮定の下で安全制約に違反しないことを高い確率で保証しながらポリシーを最適化できることです。
具体的には、不確実性定量化器の異なる構造を備えた MASE の 2 つのバリアントを紹介します。1 つは安全性とほぼ最適性の理論的保証を備えた一般化線形モデルに基づくもので、もう 1 つは安全性を確保するためのガウスプロセスと深度 RL アルゴリズムを組み合わせて安全性を最大化するものです。
褒美。
最後に、トレーニング中であっても、安全性の制約に違反することなく、グリッドワールドとセーフティジムのベンチマークで、私たちが提案したアルゴリズムが最先端のアルゴリズムよりも優れたパフォーマンスを達成することを実証します。

要約(オリジナル)

Safe exploration is essential for the practical use of reinforcement learning (RL) in many real-world scenarios. In this paper, we present a generalized safe exploration (GSE) problem as a unified formulation of common safe exploration problems. We then propose a solution of the GSE problem in the form of a meta-algorithm for safe exploration, MASE, which combines an unconstrained RL algorithm with an uncertainty quantifier to guarantee safety in the current episode while properly penalizing unsafe explorations before actual safety violation to discourage them in future episodes. The advantage of MASE is that we can optimize a policy while guaranteeing with a high probability that no safety constraint will be violated under proper assumptions. Specifically, we present two variants of MASE with different constructions of the uncertainty quantifier: one based on generalized linear models with theoretical guarantees of safety and near-optimality, and another that combines a Gaussian process to ensure safety with a deep RL algorithm to maximize the reward. Finally, we demonstrate that our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks without violating any safety constraints, even during training.

arxiv情報

著者	Akifumi Wachi,Wataru Hashimoto,Xun Shen,Kazumune Hashimoto
発行日	2023-10-05 00:47:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー