Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation

要約

強化学習 (RL) ベースの手法は、ロボット学習のためにますます研究されています。
ただし、RL ベースの手法は、探査フェーズでのサンプリング効率の低下、特に長期的な操作タスクの場合に問題が発生することが多く、一般にタスクレベルからのセマンティック情報が無視されるため、収束が遅れたり、タスクが失敗したりすることがあります。
これらの課題に取り組むために、私たちは、3 レベルの意思決定層を活用してエージェントのパフォーマンスを向上させる、時間論理ガイド付きハイブリッドポリシーフレームワーク (HyTL) を提案します。
具体的には、タスク仕様は線形時相論理 (LTL) を介してエンコードされ、パフォーマンスが向上し、解釈可能性が提供されます。
また、ウェイポイント計画モジュールは、探索効率を向上させるための高レベルのポリシーとして、LTL でエンコードされたタスクレベルからのフィードバックを使用して設計されています。
中レベルのポリシーは、どの動作プリミティブを実行するかを選択し、低レベルのポリシーは、環境と対話するための対応するパラメータを指定します。
私たちは 4 つの困難な操作タスクで HyTL を評価し、その有効性と解釈可能性を実証します。
私たちのプロジェクトは https://sites.google.com/view/hytl-0257/ から入手できます。

要約(オリジナル)

Reinforcement Learning (RL) based methods have been increasingly explored for robot learning. However, RL based methods often suffer from low sampling efficiency in the exploration phase, especially for long-horizon manipulation tasks, and generally neglect the semantic information from the task level, resulted in a delayed convergence or even tasks failure. To tackle these challenges, we propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent’s performance. Specifically, the task specifications are encoded via linear temporal logic (LTL) to improve performance and offer interpretability. And a waypoints planning module is designed with the feedback from the LTL-encoded task level as a high-level policy to improve the exploration efficiency. The middle-level policy selects which behavior primitives to execute, and the low-level policy specifies the corresponding parameters to interact with the environment. We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability. Our project is available at: https://sites.google.com/view/hytl-0257/.

arxiv情報

著者	Hao Zhang,Hao Wang,Xiucai Huang,Wenrui Chen,Zhen Kan
発行日	2024-12-29 03:34:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー