Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

要約

この研究では、強化学習 (RL) を介してトレーニングされた既存のタスクプリミティブを考慮して、エージェントが線形時間論理 (LTL) 仕様を満たすためのゼロショットメカニズム Comp-LTL を開発します。
自律ロボットは多くの場合、実行時まで不明な空間的および時間的目標を満たす必要があります。
これまでの研究は、LTL を使用して指定されたタスクを実行するための学習ポリシーに焦点を当てていましたが、その仕様は学習プロセスに組み込まれていました。
仕様を変更すると、微調整または最初からポリシーを再トレーニングする必要があります。
私たちは、再トレーニングや微調整を行わずに任意の LTL 仕様を満たすために使用できる、構成可能なタスクのプリミティブポリシーのセットを学習する、より柔軟なアプローチを提案します。
タスクプリミティブは、RL を使用してオフラインで学習し、展開時にブール合成を使用して組み合わせることができます。
この作業は、環境と一連のタスクのプリミティブポリシーを考慮して、LTL 仕様に対する決定的で曖昧さのない実現可能なソリューションを解決するために、環境の遷移システム (TS) 表現の作成と枝刈りに焦点を当てています。
プルーニングされた TS が決定的であり、実現不可能な遷移が含まれておらず、健全であることを示します。
私たちはシミュレーションによってアプローチを検証し、他の最先端のアプローチと比較し、Comp-LTL がより安全で適応性があることを示しています。

要約(オリジナル)

This work develops a zero-shot mechanism, Comp-LTL, for an agent to satisfy a Linear Temporal Logic (LTL) specification given existing task primitives trained via reinforcement learning (RL). Autonomous robots often need to satisfy spatial and temporal goals that are unknown until run time. Prior work focuses on learning policies for executing a task specified using LTL, but they incorporate the specification into the learning process. Any change to the specification requires retraining the policy, either via fine-tuning or from scratch. We present a more flexible approach — to learn a set of composable task primitive policies that can be used to satisfy arbitrary LTL specifications without retraining or fine-tuning. Task primitives can be learned offline using RL and combined using Boolean composition at deployment. This work focuses on creating and pruning a transition system (TS) representation of the environment in order to solve for deterministic, non-ambiguous, and feasible solutions to LTL specifications given an environment and a set of task primitive policies. We show that our pruned TS is deterministic, contains no unrealizable transitions, and is sound. We verify our approach via simulation and compare it to other state of the art approaches, showing that Comp-LTL is safer and more adaptable.

arxiv情報

著者	Taylor Bergeron,Zachary Serlin,Kevin Leahy
発行日	2024-12-16 18:39:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー