Reward Machines for Deep RL in Noisy and Uncertain Environments

要約

報酬マシンは、命令、安全制約、その他の時間的に拡張された報酬に値する動作を指定するための、オートマトンにヒントを得た構造を提供します。
複雑な報酬関数構造を公開することで、事実に反する学習の更新が可能になり、サンプル効率が大幅に向上します。
報酬マシンは表形式とディープ RL の両方の設定で採用されてきましたが、通常は報酬関数の構成要素を形成するドメイン固有の語彙のグラウンドトゥルース解釈に依存していました。
このようなグラウンドトゥルースの解釈は、部分的な可観測性やノイズセンシングのせいで、現実世界の多くの設定ではとらえどころのない場合があります。
このペーパーでは、騒がしく不確実な環境におけるディープ RL 用の報酬マシンの使用について検討します。
我々は、この問題を POMDP として特徴付け、ドメイン固有の語彙の不確実な解釈の下でタスク構造を活用する一連の RL アルゴリズムを提案します。
理論的分析により、この問題に対する単純なアプローチの落とし穴が明らかになりますが、実験結果では、私たちのアルゴリズムがタスク構造をうまく利用して、語彙のノイズの多い解釈下でパフォーマンスを向上させることが示されています。
私たちの結果は、部分的に観測可能な環境で報酬マシンを悪用するための一般的なフレームワークを提供します。

要約(オリジナル)

Reward Machines provide an automata-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing complex reward function structure, they enable counterfactual learning updates that have resulted in impressive sample efficiency gains. While Reward Machines have been employed in both tabular and deep RL settings, they have typically relied on a ground-truth interpretation of the domain-specific vocabulary that form the building blocks of the reward function. Such ground-truth interpretations can be elusive in many real-world settings, due in part to partial observability or noisy sensing. In this paper, we explore the use of Reward Machines for Deep RL in noisy and uncertain environments. We characterize this problem as a POMDP and propose a suite of RL algorithms that leverage task structure under uncertain interpretation of domain-specific vocabulary. Theoretical analysis exposes pitfalls in naive approaches to this problem, while experimental results show that our algorithms successfully leverage task structure to improve performance under noisy interpretations of the vocabulary. Our results provide a general framework for exploiting Reward Machines in partially observable environments.

arxiv情報

著者	Andrew C. Li,Zizhao Chen,Toryn Q. Klassen,Pashootan Vaezipoor,Rodrigo Toro Icarte,Sheila A. McIlraith
発行日	2024-06-17 16:39:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reward Machines for Deep RL in Noisy and Uncertain Environments

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー