Theoretical Barriers in Bellman-Based Reinforcement Learning

要約

高次元空間向けに設計された補強学習アルゴリズムは、しばしば、州のサンプルされた状態のサブセットでベルマン方程式を強制し、一般化に依存して州空間全体で知識を伝播します。
この論文では、この一般的なアプローチの基本的な制限を特定し、形式化します。
具体的には、このアプローチが悪用できないという単純な構造で反例問題を構築します。
私たちの調査結果は、そのようなアルゴリズムが問題に関する重要な情報を無視し、非効率につながることを明らかにしています。
さらに、この否定的な結果を文献から別のアプローチに拡張します。後知恵経験は、状態間の到達可能性を学習します。

要約(オリジナル)

Reinforcement Learning algorithms designed for high-dimensional spaces often enforce the Bellman equation on a sampled subset of states, relying on generalization to propagate knowledge across the state space. In this paper, we identify and formalize a fundamental limitation of this common approach. Specifically, we construct counterexample problems with a simple structure that this approach fails to exploit. Our findings reveal that such algorithms can neglect critical information about the problems, leading to inefficiencies. Furthermore, we extend this negative result to another approach from the literature: Hindsight Experience Replay learning state-to-state reachability.

arxiv情報

著者	Brieuc Pinon,Raphaël Jungers,Jean-Charles Delvenne
発行日	2025-02-17 16:18:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Theoretical Barriers in Bellman-Based Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー