Two-Memory Reinforcement Learning

要約

【タイトル】二つの記憶を利用する強化学習
【要約】
– 深層強化学習は重要な実用的成功を示したが、報酬情報の遅い伝播とニューラルネットワークの遅い更新により比較的遅い学習をする傾向がある。
– 一方、非パラメトリックのエピソードメモリは、表現学習を必要とせず、状態行動価値として最大エピソードリターンを使用して行動選択を行う、より速い学習の代替手段を提供する。
– エピソードメモリと強化学習はそれぞれ独自の強みと弱みを持っている。
– 本研究では、人間が複数の記憶システムを同時に活用して学習し、それらすべてから利益を得ることができることを踏まえ、エピソードメモリと強化学習を組み合わせた方法を提案する。
– この研究で提案された2Mエージェントは、エピソードメモリの速度と強化学習の最適性と一般化能力を相補的に利用し、ピュアなエピソードメモリとピュアな強化学習、および最新のメモリ増強RLエージェントよりもデータ効率が高く、性能を発揮することが示された。
– また、提案されたアプローチは、任意のエピソードメモリエージェントを他のオフポリシー強化学習アルゴリズムと組み合わせるために使用できる一般的なフレームワークを提供する。

要約(オリジナル)

While deep reinforcement learning has shown important empirical success, it tends to learn relatively slow due to slow propagation of rewards information and slow update of parametric neural networks. Non-parametric episodic memory, on the other hand, provides a faster learning alternative that does not require representation learning and uses maximum episodic return as state-action values for action selection. Episodic memory and reinforcement learning both have their own strengths and weaknesses. Notably, humans can leverage multiple memory systems concurrently during learning and benefit from all of them. In this work, we propose a method called Two-Memory reinforcement learning agent (2M) that combines episodic memory and reinforcement learning to distill both of their strengths. The 2M agent exploits the speed of the episodic memory part and the optimality and the generalization capacity of the reinforcement learning part to complement each other. Our experiments demonstrate that the 2M agent is more data efficient and outperforms both pure episodic memory and pure reinforcement learning, as well as a state-of-the-art memory-augmented RL agent. Moreover, the proposed approach provides a general framework that can be used to combine any episodic memory agent with other off-policy reinforcement learning algorithms.

arxiv情報

著者	Zhao Yang,Thomas. M. Moerland,Mike Preuss,Aske Plaat
発行日	2023-04-23 09:29:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Two-Memory Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー