Interval Markov Decision Processes with Continuous Action-Spaces

要約

タイトル：連続アクションスペースを持つ区間Markov決定過程

要約：区間Markov決定過程（IMDP）は、遷移確率が区間に属する有限状態の不確実なMarkovモデルです。最近、コントロール合成のための確率的システムの抽象としてIMDPを使用する研究が盛んになっています。しかし、連続アクションスペースを持つIMDPの合成アルゴリズムが存在しないため、アクションスペースが事前に離散的であると仮定されています。これは、多くのアプリケーションにとって制限的な仮定です。この動機に基づいて、アクション変数の境界に基づいて遷移確率の上限と下限が関数である連続アクションIMDP（caIMDP）を紹介し、期待累積報酬を最大化するための価値反復を研究します。具体的には、価値反復に関連する最大化最小化問題を$|\mathcal{Q}|$の最大化問題に分解します。そして、これらの最適化問題の単純な形式を利用して、caIMDPの価値反復を効率的に解決できる場合を特定します（例えば、線形または凸計画法で）。また、行動集合$\mathcal{A}$が多面体である場合、行動が$\mathcal{A}$の頂点である離散アクションIMDPで合成することが最適である場合もあります。数値例を用いて、結果を示します。最後に、caIMDPを制御合成の抽象として使用する方法についても短い議論を含めます。

要約(オリジナル)

Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to $|\mathcal{Q}|$ max problems, where $|\mathcal{Q}|$ is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set $\mathcal{A}$ is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of $\mathcal{A}$, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.

arxiv情報

著者	Giannis Delimpaltadakis,Morteza Lahijanian,Manuel Mazo Jr.,Luca Laurenti
発行日	2023-04-07 09:02:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Interval Markov Decision Processes with Continuous Action-Spaces

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー