Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning

要約

強化学習（RL）の問題は、ますます複雑な構造の下で考慮されています。
表形式および線形モデルが徹底的に調査されていますが、非線形関数近似、特にカーネルベースのモデルでのRLの分析研究は、最近、その強力な表現能力と理論的扱い性のために牽引力を獲得しました。
これに関連して、報酬のないRLフレームワーク内のカーネルベースのRLにおける統計効率の問題を検討します。具体的には次のように尋ねます。
既存の作業は、カーネル関数のクラスに関する制限的な仮定の下でこの質問に対処します。
最初に、生成モデルを仮定してこの質問を探り、次にこの仮定をリラックスして、エピソードの長さであるhの係数でサンプルの複雑さを高めるために緩和します。
幅広いクラスのカーネルと以前の作業と比較してよりシンプルなアルゴリズムを使用して、この基本的な問題に取り組みます。
私たちのアプローチは、RL設定に固有のカーネルリッジ回帰の新しい信頼区間を導き出します。
さらに、シミュレーションを通じて理論的な調査結果を検証します。

要約(オリジナル)

Reinforcement Learning (RL) problems are being considered under increasingly more complex structures. While tabular and linear models have been thoroughly explored, the analytical study of RL under nonlinear function approximation, especially kernel-based models, has recently gained traction for their strong representational capacity and theoretical tractability. In this context, we examine the question of statistical efficiency in kernel-based RL within the reward-free RL framework, specifically asking: how many samples are required to design a near-optimal policy? Existing work addresses this question under restrictive assumptions about the class of kernel functions. We first explore this question by assuming a generative model, then relax this assumption at the cost of increasing the sample complexity by a factor of H, the length of the episode. We tackle this fundamental problem using a broad class of kernels and a simpler algorithm compared to prior work. Our approach derives new confidence intervals for kernel ridge regression, specific to our RL setting, which may be of broader applicability. We further validate our theoretical findings through simulations.

arxiv情報

著者	Aya Kayal,Sattar Vakili,Laura Toni,Alberto Bernacchia
発行日	2025-02-11 17:15:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー