Non-ergodicity in reinforcement learning: robustness via ergodicity transformations

要約

強化学習 (RL) の想定される応用分野には、自動運転、精密農業、金融などが含まれますが、これらはすべて現実世界で RL エージェントが意思決定を行う必要があります。
これらの領域での RL 手法の採用を妨げる重大な課題は、従来のアルゴリズムの非堅牢性です。
この論文では、この堅牢性の欠如に寄与する根本的な問題は、唯一の「正しい」最適化目標としてのリターンの期待値に焦点を当てていることにあると主張します。
期待値は、無限に多くの軌跡の統計的アンサンブルの平均です。
非エルゴディックなリターンの場合、この平均は、単一の無限に長い軌跡にわたる平均とは異なります。
その結果、期待値を最適化すると、確率ゼロで非常に高い収益をもたらす政策が実現する可能性がありますが、ほぼ確実に壊滅的な結果をもたらす可能性があります。
この問題は、収集されたリターンの時系列をエルゴディック増分のあるものに変換することで回避できます。
この変換により、無限に多くの軌跡にわたる平均ではなく、個々のエージェントの長期的な収益を最適化することで、堅牢なポリシーを学習できるようになります。
我々は、データからエルゴード変換を学習するためのアルゴリズムを提案し、有益な非エルゴード環境および標準的な RL ベンチマークでその有効性を実証します。

要約(オリジナル)

Envisioned application areas for reinforcement learning (RL) include autonomous driving, precision agriculture, and finance, which all require RL agents to make decisions in the real world. A significant challenge hindering the adoption of RL methods in these domains is the non-robustness of conventional algorithms. In this paper, we argue that a fundamental issue contributing to this lack of robustness lies in the focus on the expected value of the return as the sole ‘correct’ optimization objective. The expected value is the average over the statistical ensemble of infinitely many trajectories. For non-ergodic returns, this average differs from the average over a single but infinitely long trajectory. Consequently, optimizing the expected value can lead to policies that yield exceptionally high returns with probability zero but almost surely result in catastrophic outcomes. This problem can be circumvented by transforming the time series of collected returns into one with ergodic increments. This transformation enables learning robust policies by optimizing the long-term return for individual agents rather than the average across infinitely many trajectories. We propose an algorithm for learning ergodicity transformations from data and demonstrate its effectiveness in an instructive, non-ergodic environment and on standard RL benchmarks.

arxiv情報

著者	Dominik Baumann,Erfaun Noorani,James Price,Ole Peters,Colm Connaughton,Thomas B. Schön
発行日	2023-10-17 15:13:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Non-ergodicity in reinforcement learning: robustness via ergodicity transformations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー