Reward Machine Inference for Robotic Manipulation

要約

デモンストレーションからの学習 (LfD) と強化学習 (RL) により、ロボットエージェントは複雑なタスクを実行できるようになりました。
Reward Machine (RM) は、高レベルのタスク情報を構造化することで、長期間にわたってポリシーをトレーニングする RL の機能を強化します。
この研究では、ロボット操作タスクの視覚的なデモンストレーションから直接 RM を学習するための新しい LfD アプローチを紹介します。
以前の方法とは異なり、私たちのアプローチは、事前に定義された命題や、基礎となるまばらな報酬信号についての事前の知識を必要としません。
代わりに、RM 構造を共同で学習し、RM 状態間の遷移を促進する主要な高レベルイベントを特定します。
ビジョンベースの操作タスクに関する手法を検証し、推論された RM がタスク構造を正確に捕捉し、RL エージェントが最適なポリシーを効果的に学習できることを示します。

要約(オリジナル)

Learning from Demonstrations (LfD) and Reinforcement Learning (RL) have enabled robot agents to accomplish complex tasks. Reward Machines (RMs) enhance RL’s capability to train policies over extended time horizons by structuring high-level task information. In this work, we introduce a novel LfD approach for learning RMs directly from visual demonstrations of robotic manipulation tasks. Unlike previous methods, our approach requires no predefined propositions or prior knowledge of the underlying sparse reward signals. Instead, it jointly learns the RM structure and identifies key high-level events that drive transitions between RM states. We validate our method on vision-based manipulation tasks, showing that the inferred RM accurately captures task structure and enables an RL agent to effectively learn an optimal policy.

arxiv情報

著者	Mattijs Baert,Sam Leroux,Pieter Simoens
発行日	2024-12-13 12:32:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reward Machine Inference for Robotic Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー