Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

要約

強化学習手法の大部分は、行動価値関数の効果的な推定値を取得するために必要な計算量とデータ要件に大きく影響され、それによって全体的なパフォーマンスの品質と学習手順のサンプル効率が決まります。
通常、アクション価値関数は、ベルマン演算子の経験的近似の適用とその後の考慮された関数空間への射影ステップを交互に行う反復スキームを通じて推定されます。
このスキームは、ベルマン演算子の複数の反復を一度に実行するように潜在的に一般化でき、基礎となる学習アルゴリズムに利益をもたらすことが観察されています。
しかし、これまで、特に高次元の問題において、このアイデアを効果的に実装することは困難でした。
この論文では、反復 $Q$-Network (i-QN) を紹介します。これは、それぞれが次のターゲットとして機能するアクション価値関数の調整されたシーケンスを学習することで、複数の連続したベルマン更新を可能にする新しい原理的なアプローチです。
i-QN が理論的に根拠があり、価値ベースの手法とアクター批判的な手法でシームレスに使用できることを示します。
Atari の $2600$ ゲームと MuJoCo の連続制御問題における i-QN の利点を実証的に示します。

要約(オリジナル)

The vast majority of Reinforcement Learning methods is largely impacted by the computation effort and data requirements needed to obtain effective estimates of action-value functions, which in turn determine the quality of the overall performance and the sample-efficiency of the learning procedure. Typically, action-value functions are estimated through an iterative scheme that alternates the application of an empirical approximation of the Bellman operator and a subsequent projection step onto a considered function space. It has been observed that this scheme can be potentially generalized to carry out multiple iterations of the Bellman operator at once, benefiting the underlying learning algorithm. However, till now, it has been challenging to effectively implement this idea, especially in high-dimensional problems. In this paper, we introduce iterated $Q$-Network (i-QN), a novel principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions where each serves as the target for the next. We show that i-QN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods. We empirically demonstrate the advantages of i-QN in Atari $2600$ games and MuJoCo continuous control problems.

arxiv情報

著者	Théo Vincent,Daniel Palenicek,Boris Belousov,Jan Peters,Carlo D’Eramo
発行日	2024-10-24 16:50:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー