n-Step Temporal Difference Learning with Optimal n

要約

タイトル：最適nによるn-Step Temporal Difference Learning

要約：
– n-step TD学習アルゴリズムにおける最適なnの値を見つける問題に注目する。
– 我々は1つのシミュレーション同時摂動確率的近似（SPSA）ベースのアプローチを用いて最適なnを見つける。
– ランダムプロジェクションアプローチを用いて、離散最適化設定に適応させた。
– 差分包含法を用いて、SDPSAアルゴリズムの収束を証明し、n-step TDにおける最適なnの値を見つけることができることを示す。
– 実験により、SDPSAが任意の初期値で最適なnの値を達成することを示す。

要約(オリジナル)

We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm. We find the optimal n by resorting to a model-free optimization technique involving a one-simulation simultaneous perturbation stochastic approximation (SPSA) based procedure that we adopt to the discrete optimization setting by using a random projection approach. We prove the convergence of our proposed algorithm, SDPSA, using a differential inclusions approach and show that it finds the optimal value of n in n-step TD. Through experiments, we show that the optimal value of n is achieved with SDPSA for arbitrary initial values.

arxiv情報

著者	Lakshmi Mandal,Shalabh Bhatnagar
発行日	2023-04-14 13:09:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

n-Step Temporal Difference Learning with Optimal n

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー