Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting

要約

検証セットのパフォーマンスに基づく早期停止は、教師あり学習のコンテキストでアンダーフィッティングとオーバーフィッティングの間の適切なバランスを見つけるための一般的なアプローチです。
ただし、強化学習では、ワールドモデル学習などの教師ありサブ問題であっても、データセットが継続的に進化しているため、早期停止は適用されません。
解決策として、トレーニングに使用されない継続的に収集された経験の小さなサブセットでのアンダーフィッティングおよびオーバーフィッティングの検出に基づいて、トレーニング中にデータ更新 (UTD) 比率を動的に調整する新しい一般的な方法を提案します。
最先端のモデルベースの強化学習アルゴリズムである DreamerV2 にこの方法を適用し、DeepMind Control Suite と Atari $100$k ベンチマークで評価します。
結果は、DreamerV2 のデフォルト設定と比較して、UTD 比率を私たちのアプローチで調整することにより、過小評価と過大評価のバランスを改善できること、および多くのアプリケーションでは実行できない広範なハイパーパラメーター検索と競合できることを示しています。
私たちの方法では、UTD ハイパーパラメーターを手動で設定する必要がなくなり、他の学習関連のハイパーパラメーターに関してより高い堅牢性が得られ、必要な調整の量がさらに削減されます。

要約(オリジナル)

Early stopping based on the validation set performance is a popular approach to find the right balance between under- and overfitting in the context of supervised learning. However, in reinforcement learning, even for supervised sub-problems such as world model learning, early stopping is not applicable as the dataset is continually evolving. As a solution, we propose a new general method that dynamically adjusts the update to data (UTD) ratio during training based on under- and overfitting detection on a small subset of the continuously collected experience not used for training. We apply our method to DreamerV2, a state-of-the-art model-based reinforcement learning algorithm, and evaluate it on the DeepMind Control Suite and the Atari $100$k benchmark. The results demonstrate that one can better balance under- and overestimation by adjusting the UTD ratio with our approach compared to the default setting in DreamerV2 and that it is competitive with an extensive hyperparameter search which is not feasible for many applications. Our method eliminates the need to set the UTD hyperparameter by hand and even leads to a higher robustness with regard to other learning-related hyperparameters further reducing the amount of necessary tuning.

arxiv情報

著者	Nicolai Dorka,Tim Welschehold,Wolfram Burgard
発行日	2023-03-17 17:29:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー