Non-stationary Reinforcement Learning under General Function Approximation

要約

一般関数近似は、幅広い強化学習 (RL) シナリオで大規模な状態空間とアクション空間を処理するための強力なツールです。
ただし、一般関数近似による非定常 MDP の理論的理解はまだ限られています。
本稿では、そのような試みを初めて行います。
まず、非定常 MDP に対して動的 Bellman Eluder (DBE) 次元と呼ばれる新しい複雑さの指標を提案します。これは、静的 MDP および非定常 MDP における既存の扱いやすい RL 問題の大部分を包含します。
提案された複雑さの指標に基づいて、SW-OPEA と呼ばれる新しい信頼セットベースのモデルフリーアルゴリズムを提案します。このアルゴリズムは、スライディングウィンドウメカニズムと非定常 MDP 用の新しい信頼セット設計を特徴としています。
次に、提案されたアルゴリズムの動的リグレスの上限を設定し、変動バジェットが大幅に大きくない限り、SW-OPEA が効率的であることが証明されることを示します。
さらに、非定常線形および表形式の MDP の例を通じて、小さな変動バジェットシナリオでは、既存の UCB タイプのアルゴリズムよりもアルゴリズムが優れたパフォーマンスを発揮することを示します。
私たちの知る限り、これは一般関数近似を使用した非定常 MDP での最初の動的リグレス分析です。

要約(オリジナル)

General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.

arxiv情報

著者	Songtao Feng,Ming Yin,Ruiquan Huang,Yu-Xiang Wang,Jing Yang,Yingbin Liang
発行日	2023-06-01 16:19:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Non-stationary Reinforcement Learning under General Function Approximation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー