Scalable Signal Temporal Logic Guided Reinforcement Learning via Value Function Space Optimization

要約

強化学習 (RL) と形式的手法の統合は、長期的な計画問題を解決するための有望なフレームワークとして浮上しています。
従来のアプローチには通常、状態空間とアクション空間の抽象化と、手動で作成されたラベル付け関数または述語が含まれます。
ただし、これらのアプローチの効率は、タスクがますます複雑になるにつれて低下し、その結果、ラベル付け関数または述語のサイズが指数関数的に増加します。
これらの問題に対処するために、VFSTL と呼ばれるスケーラブルなモデルベースの RL フレームワークを提案します。これは、手作りの述語を使用せずに、目に見えない STL 仕様に従うように事前トレーニングされたスキルをスケジュールします。
目標条件付き RL によって取得された一連の値関数が与えられた場合、信号時相論理 (STL) で定義された仕様のロバストネス値を最大化する最適化問題を定式化します。この値は、値関数を述語として使用して計算されます。
計算負荷をさらに軽減するために、環境状態空間を値関数空間 (VFS) に抽象化します。
次に、最適化問題はモデルベースの強化学習によって解決されます。
シミュレーション結果は、述語として値関数を使用する STL がグラウンドトゥルースの堅牢性を近似し、VFS での計画がセンサーからのデータを使用して目に見えない仕様を直接達成することを示しています。

要約(オリジナル)

The integration of reinforcement learning (RL) and formal methods has emerged as a promising framework for solving long-horizon planning problems. Conventional approaches typically involve abstraction of the state and action spaces and manually created labeling functions or predicates. However, the efficiency of these approaches deteriorates as the tasks become increasingly complex, which results in exponential growth in the size of labeling functions or predicates. To address these issues, we propose a scalable model-based RL framework, called VFSTL, which schedules pre-trained skills to follow unseen STL specifications without using hand-crafted predicates. Given a set of value functions obtained by goal-conditioned RL, we formulate an optimization problem to maximize the robustness value of Signal Temporal Logic (STL) defined specifications, which is computed using value functions as predicates. To further reduce the computation burden, we abstract the environment state space into the value function space (VFS). Then the optimization problem is solved by Model-Based Reinforcement Learning. Simulation results show that STL with value functions as predicates approximates the ground truth robustness and the planning in VFS directly achieves unseen specifications using data from sensors.

arxiv情報

著者	Yiting He,Peiran Liu,Yiding Ji
発行日	2024-08-04 04:34:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scalable Signal Temporal Logic Guided Reinforcement Learning via Value Function Space Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー