Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning

要約

大規模な状態空間の強化学習（RL）のアルゴリズムは、監視された学習サブルーチンに大きく依存して、値関数や遷移確率などのオブジェクトを推定します。
最も単純な監視されている学習問題のみが証拠的かつ効率的に解決できるため、RLアルゴリズムの実用的なパフォーマンスは、これらの監視された学習「オラクル」のどれがアクセス（およびそれらがどのように実装されるか）を想定するかによって異なります。
しかし、どのオラクルが良くなったり悪いのでしょうか？
最小限のオラクルはありますか？
この作業では、Oracle強度によって定量化されたRLの計算の複雑さに対する監視された学習オラクルの選択の影響を明確にします。
第一に、標準のエピソードアクセスモデルのブロックMDPでの報酬のない探索のタスク（関数近似を備えたRLのユビキタスな設定）の場合、2つのテキスト回帰は最小のオラクル、つまり必要なオラクルとして識別します。
十分な（軽度の規則性の仮定の下）。
第二に、より強力なリセットアクセスモデルの1つのコンテキスト回帰をほぼ最小のオラクルとして識別し、プロセスでのリセットの証明可能な計算上の利点を確立します。
第三に、低ランクMDPに焦点を広げます。そこでは、ブロックMDP設定からの類似のオラクルが不十分であるという暗号化の証拠を示しています。

要約(オリジナル)

Algorithms for reinforcement learning (RL) in large state spaces crucially rely on supervised learning subroutines to estimate objects such as value functions or transition probabilities. Since only the simplest supervised learning problems can be solved provably and efficiently, practical performance of an RL algorithm depends on which of these supervised learning ‘oracles’ it assumes access to (and how they are implemented). But which oracles are better or worse? Is there a minimal oracle? In this work, we clarify the impact of the choice of supervised learning oracle on the computational complexity of RL, as quantified by the oracle strength. First, for the task of reward-free exploration in Block MDPs in the standard episodic access model — a ubiquitous setting for RL with function approximation — we identify two-context regression as a minimal oracle, i.e. an oracle that is both necessary and sufficient (under a mild regularity assumption). Second, we identify one-context regression as a near-minimal oracle in the stronger reset access model, establishing a provable computational benefit of resets in the process. Third, we broaden our focus to Low-Rank MDPs, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.

arxiv情報

著者	Dhruv Rohatgi,Dylan J. Foster
発行日	2025-02-12 18:47:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー