Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

要約

オートマトンベースのアプローチにより、ロボットがさまざまな複雑なタスクを実行できるようになりました。
ただし、既存のオートマトンベースのアルゴリズムのほとんどは、対象となるタスクの手動でカスタマイズされた状態表現に大きく依存しており、深層強化学習アルゴリズムへの適用性が制限されています。
この問題に対処するために、強化学習に Transformer を組み込むことで、Transformer の構造的特徴を 2 回利用する Double-Transformer ガイド付き時相論理フレームワーク (T2TL) を開発します。つまり、最初に Transformer モジュールを介して LTL 命令をエンコードして、効率的に理解することができます。
トレーニング中にタスク命令を実行し、その後、Transformer を介してコンテキスト変数を再度エンコードして、タスクのパフォーマンスを向上させます。
特に、LTL 命令は co-safe LTL で指定されます。
セマンティクスを保持する書き換え操作として、LTL 進行を利用して複雑なタスクを学習可能なサブ目標に分解します。これにより、非マルコフ報酬決定プロセスがマルコフ報酬決定プロセスに変換されるだけでなく、複数のサブゴールの同時学習によりサンプリング効率も向上します。
タスク。
Transformer モジュールの学習を容易にするために、環境に依存しない LTL 事前トレーニングスキームがさらに組み込まれており、その結果、LTL の表現が向上します。
シミュレーション結果は、T2TL フレームワークの有効性を示しています。

要約(オリジナル)

Automaton based approaches have enabled robots to perform various complex tasks. However, most existing automaton based algorithms highly rely on the manually customized representation of states for the considered task, limiting its applicability in deep reinforcement learning algorithms. To address this issue, by incorporating Transformer into reinforcement learning, we develop a Double-Transformer-guided Temporal Logic framework (T2TL) that exploits the structural feature of Transformer twice, i.e., first encoding the LTL instruction via the Transformer module for efficient understanding of task instructions during the training and then encoding the context variable via the Transformer again for improved task performance. Particularly, the LTL instruction is specified by co-safe LTL. As a semantics-preserving rewriting operation, LTL progression is exploited to decompose the complex task into learnable sub-goals, which not only converts non-Markovian reward decision processes to Markovian ones, but also improves the sampling efficiency by simultaneous learning of multiple sub-tasks. An environment-agnostic LTL pre-training scheme is further incorporated to facilitate the learning of the Transformer module resulting in an improved representation of LTL. The simulation results demonstrate the effectiveness of the T2TL framework.

arxiv情報

著者	Hao Zhang,Hao Wang,Zhen Kan
発行日	2023-07-17 06:08:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー