The primacy bias in Model-based RL

要約

深層強化学習 (DRL) の優先バイアスは、エージェントが初期のデータを過剰適合し、新しいデータから学習する能力を失う傾向を指し、DRL アルゴリズムのパフォーマンスを大幅に低下させる可能性があります。
これまでの研究では、エージェントのパラメータをリセットするなどの簡単な手法を採用することで、優位性バイアスを大幅に軽減できることが示されています。
ただし、エージェントのパラメーターをリセットすると、モデルベースの強化学習 (MBRL) のコンテキストでパフォーマンスが損なわれることが観察されています。
実際、さらなる調査により、MBRL の優位性バイアスはモデルフリー RL のそれとは異なることがわかりました。
この研究では、MBRL における優位性バイアスの調査に焦点を当て、MBRL で機能するワールドモデルリセットを提案します。
この方法を 2 つの異なる MBRL アルゴリズム、MBPO と DreamerV2 に適用します。
MuJoCo および DeepMind Control Suite での複数の連続制御タスク、および Atari 100k ベンチマークでの離散制御タスクでのメソッドの有効性を検証します。
その結果、ワールドモデルのリセットにより、モデルベースの設定における優先バイアスが大幅に軽減され、アルゴリズムのパフォーマンスが向上することがわかりました。
また、ワールドモデルのリセットを効果的に実行する方法についてのガイドも提供します。

要約(オリジナル)

The primacy bias in deep reinforcement learning (DRL), which refers to the agent’s tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of DRL algorithms. Previous studies have shown that employing simple techniques, such as resetting the agent’s parameters, can substantially alleviate the primacy bias. However, we observe that resetting the agent’s parameters harms its performance in the context of model-based reinforcement learning (MBRL). In fact, on further investigation, we find that the primacy bias in MBRL differs from that in model-free RL. In this work, we focus on investigating the primacy bias in MBRL and propose world model resetting, which works in MBRL. We apply our method to two different MBRL algorithms, MBPO and DreamerV2. We validate the effectiveness of our method on multiple continuous control tasks on MuJoCo and DeepMind Control Suite, as well as discrete control tasks on Atari 100k benchmark. The results show that world model resetting can significantly alleviate the primacy bias in model-based setting and improve algorithm’s performance. We also give a guide on how to perform world model resetting effectively.

arxiv情報

著者	Zhongjian Qiao,Jiafei Lyu,Xiu Li
発行日	2023-10-23 15:12:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The primacy bias in Model-based RL

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー