Distributional Reinforcement Learning with Online Risk-awareness Adaption

要約

実際のアプリケーションで強化学習 (RL) を使用するには、最適ではない結果を考慮する必要があります。これは、エージェントの不確実な環境に対する精通度に依存します。
学習の過程で認識リスクのレベルを動的に調整することで、セーフティクリティカルな環境において信頼性の高い最適なポリシーを戦術的に達成し、静的なリスクレベルの準最適性に取り組むことができます。
この研究では、オンラインリスク適応型分布型 RL (DRL-ORA) という新しいフレームワークを導入します。このフレームワークは、オンラインで総変動最小化問題を解くことによって、偶発的不確実性と認識論的不確実性を複合的に定量化し、認識論的リスクレベルを動的に選択できます。
リスクレベルの選択は、Follow-The-Leader タイプのアルゴリズムを使用したグリッド検索によって効率的に行うことができ、そのオフラインオラクルは、損失関数の特別な修正の下で (意思決定分析コミュニティにおける) 「満足の尺度」に関連しています。
固定リスクレベルまたは手動で事前に決定されたリスクレベルの適応に依存する既存の方法よりも DRL-ORA が優れたパフォーマンスを発揮するタスクの複数のクラスを示します。
変更の単純さを考えると、このフレームワークはほとんどの RL アルゴリズムのバリアントに簡単に組み込むことができると考えられます。

要約(オリジナル)

The use of reinforcement learning (RL) in practical applications requires considering sub-optimal outcomes, which depend on the agent’s familiarity with the uncertain environment. Dynamically adjusting the level of epistemic risk over the course of learning can tactically achieve reliable optimal policy in safety-critical environments and tackle the sub-optimality of a static risk level. In this work, we introduce a novel framework, Distributional RL with Online Risk Adaption (DRL-ORA), which can quantify the aleatory and epistemic uncertainties compositely and dynamically select the epistemic risk levels via solving a total variation minimization problem online. The risk level selection can be efficiently achieved through grid search using a Follow-The-Leader type algorithm, and its offline oracle is related to ‘satisficing measure’ (in the decision analysis community) under a special modification of the loss function. We show multiple classes of tasks where DRL-ORA outperforms existing methods that rely on either a fixed risk level or manually predetermined risk level adaption. Given the simplicity of our modifications, we believe the framework can be easily incorporated into most RL algorithm variants.

arxiv情報

著者	Yupeng Wu,Wenjie Huang
発行日	2024-03-11 15:36:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Distributional Reinforcement Learning with Online Risk-awareness Adaption

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー