Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

要約

シールドは、安全な強化学習 (RL) を実現するための一般的な手法です。
ただし、古典的なシールドアプローチには非常に限定的な前提が伴うため、複雑な環境、特に連続的な状態またはアクション空間を持つ環境での展開が困難になります。
この論文では、より汎用性の高い近似モデルベースシールド (AMBS) フレームワークを連続設定に拡張します。
特に、Safety Gym をテストベッドとして使用し、AMBS と一般的な制約付き RL アルゴリズムをより直接比較できるようにしています。
また、継続的な設定に対する強力な確率的安全性保証も提供します。
さらに、ポリシー勾配を直接変更する 2 つの新しいペナルティ手法を提案します。これにより、実験では経験的により安定した収束が得られます。

要約(オリジナル)

Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms. We also provide strong probabilistic safety guarantees for the continuous setting. In addition, we propose two novel penalty techniques that directly modify the policy gradient, which empirically provide more stable convergence in our experiments.

arxiv情報

著者	Alexander W. Goodall,Francesco Belardinelli
発行日	2024-02-01 17:55:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー