The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

要約

この論文は、確率的近似再帰 \[ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0 に関するものです。
, \] ここで、{\emestimates} $\theta_n\in\Re^d$ と $ \{ \Phi_n \}$ は一般状態空間上のマルコフ連鎖です。
消失ステップサイズ列に関する標準的なリプシッツの仮定と条件に加えて、関連する \textit{平均流量} $ \tfrac{d}{dt} \vartheta_t = \bar{f}(\vartheta_t)$ が仮定されます。
、$\theta^*$ で示される静止点でグローバルに漸近的に安定します。ここで、$\bar{f}(\theta)=\text{ E}[f(\theta,\Phi)]$ と $\Phi$ は
チェーンの定常分布。
主な結果は、チェーンの平均流量と (DV3) として知られるドンスカー・ヴァラダン・リアプノフ・ドリフト条件のバージョンに関する追加条件下で確立されます。 (i) $L_4 の推定値の収束を暗示する適切なリアプノフ関数が構築されます。
$。
(ii) 正規化誤差に対する通常の 1 次元 CLT と同様に、関数 CLT が確立されます。
CLT と組み合わせたモーメント境界は、CLT の正規化共分散 $\text{ E} [ z_n z_n^T ]$ が漸近共分散 $\Sigma^\Theta$ に収束することを意味します。ここで、$z_n= (\theta_n-\theta)
^*)/\sqrt{\alpha_n}$。
(iii) CLT は、ステップサイズに関する標準的な仮定に従って、平均パラメータ $\theta^{\text{ PR}}_n$ の正規化バージョン $z^{\text{ PR}}_n$ を保持します。
さらに、$\theta^{\text{ PR}}_n$ と $z^{\text{ PR}}_n$ の両方の正規化共分散は、最小値である $\Sigma^{\text{ PR}}$ に収束します。
Polyak と Ruppert の共分散。
(iv)} $f$ と $\bar{f}$ が $\theta$ 内で線形であり、マルコフ連鎖が幾何学的にエルゴードであるが (DV3) を満たさない例を示します。
アルゴリズムは収束しますが、$\theta_n$ の 2 番目のモーメントには制限がなく、実際には発散します。

要約(オリジナル)

The paper concerns the stochastic approximation recursion, \[ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, \] where the {\em estimates} $\theta_n\in\Re^d$ and $ \{ \Phi_n \}$ is a Markov chain on a general state space. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} $ \tfrac{d}{dt} \vartheta_t = \bar{f}(\vartheta_t)$, is globally asymptotically stable with stationary point denoted $\theta^*$, where $\bar{f}(\theta)=\text{ E}[f(\theta,\Phi)]$ with $\Phi$ having the stationary distribution of the chain. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3) for the chain: (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\text{ E} [ z_n z_n^T ]$ to the asymptotic covariance $\Sigma^\Theta$ in the CLT, where $z_n= (\theta_n-\theta^*)/\sqrt{\alpha_n}$. (iii) The CLT holds for the normalized version $z^{\text{ PR}}_n$ of the averaged parameters $\theta^{\text{ PR}}_n$, subject to standard assumptions on the step-size. Moreover, the normalized covariance of both $\theta^{\text{ PR}}_n$ and $z^{\text{ PR}}_n$ converge to $\Sigma^{\text{ PR}}$, the minimal covariance of Polyak and Ruppert. (iv)} An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment of $\theta_n$ is unbounded and in fact diverges.

arxiv情報

著者	Vivek Borkar,Shuhang Chen,Adithya Devraj,Ioannis Kontoyiannis,Sean Meyn
発行日	2024-02-21 17:11:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー