非長方形の堅牢なポリシー評価は一般にNPハードですが、近似であっても、構造的シンプルさのためにこれらの複雑さの障壁を回避する$ l_p $ $で縛られた不確実性セットの強力なクラスを特定します。
さらに、このクラスは無限に多くの\ texttt {sa} -Rectangular $ l_p $ boundedセットに分解し、その構造特性を活用して$ l_p $ rmdpsの新しいデュアル定式化を導き出すことができることを示します。
We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of $L_p$-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many \texttt{sa}-rectangular $L_p$-bounded sets and leverage its structural properties to derive a novel dual formulation for $L_p$ RMDPs. This formulation provides key insights into the adversary’s strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.
著者 | Navdeep Kumar,Adarsh Gupta,Maxence Mohamed Elfatihi,Giorgia Ramponi,Kfir Yehuda Levy,Shie Mannor |
発行日 | 2025-02-13 15:55:00+00:00 |
arxivサイト | arxiv_id(pdf) |
提供元, 利用サービス
arxiv.jp, Google