Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

要約

無限期間の割引マルコフ決定過程を考察し、自然方策勾配 (NPG) の収束率と、対数線形方策クラスを使用した Q-NPG 法の研究を行います。
互換性のある関数近似フレームワークを使用すると、対数線形ポリシーを使用する両方のメソッドを、ポリシーミラー降下 (PMD) メソッドの不正確なバージョンとして記述できます。
両方の方法が線形収束率と $\tilde{\mathcal{O}}(1/\epsilon^2)$ サンプルの複雑さを達成することを示します。
凸正則化。
最後に、副産物として、任意の一定のステップサイズで両方の方法の準線形収束率を取得します。

要約(オリジナル)

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.

arxiv情報

著者	Rui Yuan,Simon S. Du,Robert M. Gower,Alessandro Lazaric,Lin Xiao
発行日	2023-02-21 14:48:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー