Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

要約

タイトル：LTL要件付きのサンプル効率の良いモデルフリー強化学習と最適性の保証
要約：
– 線形時間論理(LTL)は、システムポリシーの高レベルな目的を指定するために広く使用されており、自律システムがこのような仕様に関して最適ポリシーを学習することは非常に望ましいです。
– LTL仕様から最適ポリシーを学習することは容易ではありませんが、本研究では、Markov Decision Processes(MDP)でモデル化された未知の確率的システムに対して、最適ポリシーを効率的に学習するモデルフリー強化学習(RL)アプローチを提案しています。
– 既製のモデルフリーRLアルゴリズムと組み合わせて適用することで、新しい、より一般的な商品MDP、報酬構造、割引メカニズムを提案し、与えられたLTL仕様を満たす確率を最大化する最適ポリシーを効率的に学習します。
– 同様に、RLの主要なパラメータを選択して最適性を保証するための改良された理論的結果も提供します。
– 学習されたポリシーを直接評価するために、確率的モデルチェッカーPRISMを採用して、そのような仕様を満たすポリシーの確率を計算します。
– 異なるLTLタスクにおける複数のタブラーモデルMDP環境での実験により、サンプル効率の向上と最適ポリシーの収束が実証されています。

要約(オリジナル)

Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.

arxiv情報

著者	Daqian Shao,Marta Kwiatkowska
発行日	2023-05-02 12:57:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー