Learning Robust Reward Machines from Noisy Labels

要約

この論文は、騒々しい実行トレースの強化学習（RL）エージェントのための堅牢な報酬機（RMS）を学習するアプローチであるProb-IRMを紹介します。
RM駆動型RLの重要な側面は、エージェントのタスクを異なるサブタスクに分解する有限状態マシンの活用です。
ProbIRMは、ベイジアン後方の信念を使用して騒々しい痕跡からRMSを学習するために、騒々しい例に堅牢に堅牢になっている最先端の帰納的論理プログラミングフレームワークを使用して、矛盾に対する堅牢性を確保します。
結果の極めて重要なのは、RM学習とポリシー学習の間のインターリービングです。RLエージェントが現在のRMに受け入れられないと考えられているトレースを生成するたびに、新しいRMが学習されます。
RLエージェントのトレーニングをスピードアップするために、Prob-Irmは、痕跡から派生した事後ベイジアンの信念を使用する報酬形状の確率的定式化を採用しています。
私たちの実験分析は、ProbIRMが騒々しい痕跡からRMSを（潜在的に不完全な）学習し、それらを悪用してRLエージェントを訓練してタスクをうまく解決できることを示しています。
騒々しいトレースからRMを学習することの複雑さにもかかわらず、Prob-IRMで訓練されたエージェントは、手作りのRMSを提供するエージェントに比較的パフォーマンスを発揮します。

要約(オリジナル)

This paper presents PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state machine that decomposes the agent’s task into different subtasks. PROB-IRM uses a state-of-the-art inductive logic programming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of beliefs, thus ensuring robustness against inconsistencies. Pivotal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental analysis shows that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully. Despite the complexity of learning the RM from noisy traces, agents trained with PROB-IRM perform comparably to agents provided with handcrafted RMs.

arxiv情報

著者	Roko Parac,Lorenzo Nodari,Leo Ardon,Daniel Furelos-Blanco,Federico Cerutti,Alessandra Russo
発行日	2025-03-21 14:07:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Robust Reward Machines from Noisy Labels

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー