Planning with a Learned Policy Basis to Optimally Solve Complex Tasks


従来の強化学習 (RL) 手法は、広範囲の逐次決定問題を首尾よく解決できます。
同じ一連の部分問題を含む有限状態オートマトン (FSA) によって記述されるタスクでは、これらの (部分) ポリシーの組み合わせを使用して、追加の学習なしで最適な解決策を生成できます。


Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments.


著者 Guillermo Infante,David Kuric,Anders Jonsson,Vicenç Gómez,Herke van Hoof
発行日 2024-03-22 15:51:39+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.AI, cs.LG パーマリンク