Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

要約

部分的に観察可能なマルコフ決定プロセス (POMDP) は、移行および観察の不確実性の下での逐次的な意思決定のための強力なモデルです。
この論文では、(無期限) 最大到達可能確率問題 (MRPP) として知られる、POMDP における困難かつ重要な問題を研究します。この問題では、目標は、いくつかのターゲット状態に到達する確率を最大化することです。
これは、論理仕様を使用したモデル検査における中心的な問題でもあり、当然ながら割引されません (割引係数は 1)。
割引問題のために開発されたポイントベースの手法の成功に触発されて、我々は MRPP へのその拡張を研究します。
具体的には、トライアルベースのヒューリスティック検索値の反復手法に焦点を当て、これらの手法の長所を活用して、信念空間の効率的な探索 (値の境界を介した情報に基づいた検索) を行いながら、無限の地平線のループを処理する際の欠点に対処する新しいアルゴリズムを紹介します。
問題。
このアルゴリズムは、最適な到達可能性の確率に関する両側境界を持つポリシーを生成します。
特定の条件下で下から最適なポリシーに収束することを証明します。
一連のベンチマークでの実験による評価では、確率保証と計算時間の両方において、ほぼすべてのケースで当社のアルゴリズムが既存の手法よりも優れていることが示されています。

要約(オリジナル)

Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a core problem in model checking with logical specifications and is naturally undiscounted (discount factor is one). Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. Specifically, we focus on trial-based heuristic search value iteration techniques and present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space (informed search via value bounds) while addressing their drawbacks in handling loops for indefinite-horizon problems. The algorithm produces policies with two-sided bounds on optimal reachability probabilities. We prove convergence to an optimal policy from below under certain conditions. Experimental evaluations on a suite of benchmarks show that our algorithm outperforms existing methods in almost all cases in both probability guarantees and computation time.

arxiv情報

著者	Qi Heng Ho,Martin S. Feather,Federico Rossi,Zachary N. Sunberg,Morteza Lahijanian
発行日	2024-06-05 02:33:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー