Time-Constrained Robust MDPs

要約

環境の不確実性が優勢な現実世界のシナリオに強化学習アルゴリズムを導入するには、堅牢な強化学習が不可欠です。
従来のロバストな強化学習は、結果の状態の逆確率の尺度がさまざまな状態やアクションにわたって独立していると仮定される長方形性の仮定に依存することがよくあります。
この仮定は、実際にはめったに満たされませんが、過度に保守的な政策につながります。
この問題に対処するために、多因子、相関、時間依存の外乱を考慮する新しい時間制約ロバスト MDP (TC-RMDP) 定式化を導入し、現実世界のダイナミクスをより正確に反映します。
この定式化は従来の長方形パラダイムを超え、新しい視点を提供し、ロバストな RL の分析フレームワークを拡張します。
私たちは、それぞれ異なるレベルの環境情報を使用する 3 つの異なるアルゴリズムを提案し、それらを連続制御ベンチマークで広範囲に評価します。
私たちの結果は、これらのアルゴリズムがパフォーマンスと堅牢性の間で効率的なトレードオフをもたらし、古典的なベンチマークでの堅牢性を維持しながら、時間に制約のある環境で従来のディープロバスト RL 手法を上回るパフォーマンスを発揮することを示しています。
この研究は、堅牢な RL における一般的な仮定を再考し、より実用的で現実的な RL アプリケーションを開発するための新しい道を開きます。

要約(オリジナル)

Robust reinforcement learning is essential for deploying reinforcement learning algorithms in real-world scenarios where environmental uncertainty predominates. Traditional robust reinforcement learning often depends on rectangularity assumptions, where adverse probability measures of outcome states are assumed to be independent across different states and actions. This assumption, rarely fulfilled in practice, leads to overly conservative policies. To address this problem, we introduce a new time-constrained robust MDP (TC-RMDP) formulation that considers multifactorial, correlated, and time-dependent disturbances, thus more accurately reflecting real-world dynamics. This formulation goes beyond the conventional rectangularity paradigm, offering new perspectives and expanding the analytical framework for robust RL. We propose three distinct algorithms, each using varying levels of environmental information, and evaluate them extensively on continuous control benchmarks. Our results demonstrate that these algorithms yield an efficient tradeoff between performance and robustness, outperforming traditional deep robust RL methods in time-constrained environments while preserving robustness in classical benchmarks. This study revisits the prevailing assumptions in robust RL and opens new avenues for developing more practical and realistic RL applications.

arxiv情報

著者	Adil Zouitine,David Bertoin,Pierre Clavier,Matthieu Geist,Emmanuel Rachelson
発行日	2024-06-12 16:45:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Time-Constrained Robust MDPs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー