Convergence of SARSA with linear function approximation: The random horizon case

要約

線形関数近似と組み合わせた強化学習アルゴリズム SARSA は、無限水平線割引マルコフ決定問題 (MDP) に対して収束することが示されています。
この論文では、これまで示されていないランダムホライズン MDP のアルゴリズムの収束を調査します。
無限ホライズン割引 MDP の以前の結果と同様に、動作ポリシーが $\varepsilon$-soft であり、線形関数近似の重みベクトルに関してリプシッツ連続であり、十分に小さいリプシッツ定数がある場合、アルゴリズムは次のことを示します。
ランダムホライズン MDP を考慮すると、確率 1 に収束します。

要約(オリジナル)

The reinforcement learning algorithm SARSA combined with linear function approximation has been shown to converge for infinite horizon discounted Markov decision problems (MDPs). In this paper, we investigate the convergence of the algorithm for random horizon MDPs, which has not previously been shown. We show, similar to earlier results for infinite horizon discounted MDPs, that if the behaviour policy is $\varepsilon$-soft and Lipschitz continuous with respect to the weight vector of the linear function approximation, with small enough Lipschitz constant, then the algorithm will converge with probability one when considering a random horizon MDP.

arxiv情報

著者	Lina Palmborg
発行日	2023-06-07 15:51:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Convergence of SARSA with linear function approximation: The random horizon case

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー