最新の強化学習 (RL) は、オンラインとオフラインのバリエーションに分類できます。
オンラインとオフラインの両方の RL の極めて重要な側面として、ベルマン方程式に関する現在の研究は、分布特性などのベルマン誤差に固有の構造的特性を調査するというよりは、主に最適化手法とパフォーマンスの向上を中心に展開しています。
これに基づいて、ベルマン誤差の正規分布を仮定する一般的に使用される平均二乗誤差 (MSELoss) の代替として、ロジスティック最尤関数 (LLoss) の利用を提案しました。
特に、さまざまな RL ベースライン手法の損失関数にロジスティック補正を適用し、LLoss の結果が一貫して MSE の結果を上回ることを観察しました。
Modern reinforcement learning (RL) can be categorized into online and offline variants. As a pivotal aspect of both online and offline RL, current research on the Bellman equation revolves primarily around optimization techniques and performance enhancement rather than exploring the inherent structural properties of the Bellman error, such as its distribution characteristics. This study investigates the distribution of the Bellman approximation error through iterative exploration of the Bellman equation with the observation that the Bellman error approximately follows the Logistic distribution. Based on this, we proposed the utilization of the Logistic maximum likelihood function (LLoss) as an alternative to the commonly used mean squared error (MSELoss) that assumes a Normal distribution for Bellman errors. We validated the hypotheses through extensive numerical experiments across diverse online and offline environments. In particular, we applied the Logistic correction to loss functions in various RL baseline methods and observed that the results with LLoss consistently outperformed the MSE counterparts. We also conducted the Kolmogorov-Smirnov tests to confirm the reliability of the Logistic distribution. Moreover, our theory connects the Bellman error to the proportional reward scaling phenomenon by providing a distribution-based analysis. Furthermore, we applied the bias-variance decomposition for sampling from the Logistic distribution. The theoretical and empirical insights of this study lay a valuable foundation for future investigations and enhancements centered on the distribution of Bellman error.
著者 | Outongyi Lv,Bingxin Zhou |
発行日 | 2023-12-13 14:43:43+00:00 |
arxivサイト | arxiv_id(pdf) |
