Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space

要約

本稿では、安全到達目標を達成することを目的として、部分観測可能環境における安全な強化学習（RL）の問題を掘り下げる。従来の部分可観測マルコフ決定過程(POMDP)では、安全性を確保するために潜在状態の信念を推定することが一般的であった。しかし、連続的な状態空間における観測から潜在的な状態を推定するために、POMDPにおいて最適なベイズフィルタを正確に推定することは、難解な尤度に起因する大きな課題である。この問題に対処するため、未知のシステムダイナミクスや部分的な観測環境に直面しても、RLの安全性をほぼ確実に保証する確率モデルに基づくアプローチを提案する。予測状態表現(PSR)と再現カーネルヒルベルト空間(RKHS)を活用して、将来の多段階観測を解析的に表現し、この文脈での結果は証明可能である。さらに、カーネルベイズ則から本質的な演算子を導出し、様々な演算子を用いた再帰的な将来観測値の推定を可能にした。textit{undercompleness}の仮定の下で、観測空間と行動空間の無限サイズに対するRLアルゴリズムの多項式サンプル複雑度が確立され、$epsilon-$suboptimalな安全政策保証が保証される。

要約(オリジナル)

This paper delves into the problem of safe reinforcement learning (RL) in a partially observable environment with the aim of achieving safe-reachability objectives. In traditional partially observable Markov decision processes (POMDP), ensuring safety typically involves estimating the belief in latent states. However, accurately estimating an optimal Bayesian filter in POMDP to infer latent states from observations in a continuous state space poses a significant challenge, largely due to the intractable likelihood. To tackle this issue, we propose a stochastic model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics and partial observation environments. We leveraged the Predictive State Representation (PSR) and Reproducing Kernel Hilbert Space (RKHS) to represent future multi-step observations analytically, and the results in this context are provable. Furthermore, we derived essential operators from the kernel Bayes’ rule, enabling the recursive estimation of future observations using various operators. Under the assumption of \textit{undercompleness}, a polynomial sample complexity is established for the RL algorithm for the infinite size of observation and action spaces, ensuring an $\epsilon-$suboptimal safe policy guarantee.

arxiv情報

著者	Xiaoyuan Cheng,Boli Chen,Liz Varga,Yukun Hu
発行日	2023-12-01 17:01:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー