Optimizing Return Distributions with Distributional Dynamic Programming

要約

特別なケースとして標準の強化学習を使用して、収益分布の統計関数を最適化するための分布動的プログラミング (DP) 手法を紹介します。
以前の分散型 DP メソッドは、従来の DP と同じクラスの期待されるユーティリティを最適化できました。
期待される効用を超えるために、分配 DP とストック増強を組み合わせます。これは、リスクに敏感な RL のコンテキストでクラシック DP に以前に導入された手法であり、MDP の状態がこれまでに得られた報酬の統計で増強されます (初回以降)。
ステップ）。
我々は、最近研究された多くの問題が株式増強収益分布最適化として定式化できることを発見し、それらを解決するために分布 DP を使用できることを示します。
私たちは、分布価値とポリシーの反復を分析し、これらの分布 DP 手法がどのような目的を最適化できるか、またはできないかを限界と研究で分析します。
我々は、分布 DP を使用して、条件付きバリューアットリスクや恒常性制御などのさまざまな株式増加収益分布最適化問題を解決する方法を概説する多くのアプリケーションについて説明します。
株式増加収益分布の最適化と分布 DP の実際的な可能性を強調するために、分布値反復の中心となるアイデアをディープ RL エージェント DQN と組み合わせ、議論したアプリケーションのインスタンスを解決するためにそれを経験的に評価します。

要約(オリジナル)

We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step). We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we combine the core ideas of distributional value iteration with the deep RL agent DQN, and empirically evaluate it for solving instances of the applications discussed.

arxiv情報

著者	Bernardo Ávila Pires,Mark Rowland,Diana Borsa,Zhaohan Daniel Guo,Khimya Khetarpal,André Barreto,David Abel,Rémi Munos,Will Dabney
発行日	2025-01-22 17:20:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Optimizing Return Distributions with Distributional Dynamic Programming

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー