Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

要約

線形TD（$ \ lambda $）は、ポリシー評価のための最も基本的な強化学習アルゴリズムの1つです。
以前は、収束率は通常、線形独立した特徴の仮定の下で確立されていますが、これは多くの実際のシナリオでは保持されません。
代わりに、このペーパーでは、アルゴリズムの変更や追加の仮定を行うことなく、任意の機能の下で動作する線形TD（$ \ lambda $）の最初の$ l^2 $収束率を確立します。
私たちの結果は、割引と平均の報酬設定の両方に適用されます。
任意の特徴に起因するソリューションの潜在的な非独自性に対処するために、単一のポイントではなくソリューションセットへの収束速度を特徴とする新しい確率的近似結果を開発します。

要約(オリジナル)

Linear TD($\lambda$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold in many practical scenarios. This paper instead establishes the first $L^2$ convergence rates for linear TD($\lambda$) operating under arbitrary features, without making any algorithmic modification or additional assumptions. Our results apply to both the discounted and average-reward settings. To address the potential non-uniqueness of solutions resulting from arbitrary features, we develop a novel stochastic approximation result featuring convergence rates to the solution set instead of a single point.

arxiv情報

著者	Zixuan Xie,Xinyu Liu,Rohan Chandra,Shangtong Zhang
発行日	2025-05-27 16:17:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー