Actor-Critic or Critic-Actor? A Tale of Two Time Scales

要約

より速い時間スケールで計算される値関数と、より遅い時間スケールで計算されるポリシーを備えた 2 つの時間スケールの確率的近似として、表形式のアクター-クリティカルアルゴリズムの標準的な定式化を再検討します。
これはポリシーの反復をエミュレートします。
時間スケールの反転が実際に値の反復をエミュレートし、正当なアルゴリズムであることがわかります。
収束の証明を提供し、関数近似の有無 (線形関数近似器と非線形関数近似器の両方を使用) で 2 つを実験的に比較し、提案したクリティカル – アクターアルゴリズムが精度と計算量の両方の点でアクター – クリティカルと同等のパフォーマンスを発揮することを観察します。

要約(オリジナル)

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.

arxiv情報

著者	Shalabh Bhatnagar,Vivek S. Borkar,Soumyajit Guin
発行日	2023-05-26 15:23:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー