The Role of Deep Learning Regularizations on Actors in Offline RL

要約

ドロップアウト、層正規化、重み減衰などの深層学習の正則化手法は、最新の人工ニューラルネットワークの構築に広く採用されており、多くの場合、より堅牢なトレーニングプロセスと改善された汎化機能が実現します。
ただし、強化学習（RL）の領域では、これらの技術の適用は限定されており、通常は値関数推定器に適用されており（Hiraoka et al., 2021; Smith et al., 2022）、有害な影響をもたらす可能性があります。
この問題は、教師あり学習とよく似ているものの、あまり注目されていないオフライン RL 設定ではさらに顕著です。
継続的オフライン RL における最近の研究 (Park et al., 2024) では、十分強力な批評家ネットワークを構築できるものの、アクターネットワークの一般化が依然としてボトルネックであることが実証されました。
この研究では、オフライン RL アクタークリティカルアルゴリズムのアクターネットワークに標準の正則化手法を適用すると、2 つのアルゴリズムと 3 つの異なる連続 D4RL ドメインにわたって平均 6% の改善が得られることを実証的に示します。

要約(オリジナル)

Deep learning regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities. However, in the domain of Reinforcement Learning (RL), the application of these techniques has been limited, usually applied to value function estimators (Hiraoka et al., 2021; Smith et al., 2022), and may result in detrimental effects. This issue is even more pronounced in offline RL settings, which bear greater similarity to supervised learning but have received less attention. Recent work in continuous offline RL (Park et al., 2024) has demonstrated that while we can build sufficiently powerful critic networks, the generalization of actor networks remains a bottleneck. In this study, we empirically show that applying standard regularization techniques to actor networks in offline RL actor-critic algorithms yields improvements of 6% on average across two algorithms and three different continuous D4RL domains.

arxiv情報

著者	Denis Tarasov,Anja Surina,Caglar Gulcehre
発行日	2024-11-21 14:35:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Role of Deep Learning Regularizations on Actors in Offline RL

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー