Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

要約

人間の知能における一般化可能な解決策を達成するための極めて重要な要素として、推論は、部分から全体への議論を要約し、因果関係を発見することによって、強化学習 (RL) エージェントがさまざまな目標に向かって一般化するための大きな可能性を提供します。
しかし、因果関係を発見して表現する方法には依然として大きなギャップがあり、因果関係の RL の開発を妨げています。
この論文では、目標条件付き RL (GCRL) を、オブジェクトとイベントの間の関係に基づいて構築された構造である因果グラフ (CG) で強化します。
我々は、GCRL問題を潜在変数としてCGを用いた変分尤度最大化に新たに定式化した。
導出された目的を最適化するために、我々は、介入データを使用して CG の事後値を推定するという 2 つのステップを交互に行う、理論的なパフォーマンスを保証するフレームワークを提案します。
CG を使用して一般化可能なモデルと解釈可能なポリシーを学習します。
推論に基づく汎化能力を検証する公開ベンチマークが不足しているため、9 つのタスクを設計し、これらのタスクに関する 5 つのベースラインに対して提案された方法の有効性を経験的に示します。
さらに理論的に分析すると、パフォーマンスの向上は原因発見、移行モデリング、ポリシートレーニングの好循環によるものであることが示されており、これは広範なアブレーション研究における実験的証拠と一致しています。

要約(オリジナル)

As a pivotal component to attaining generalizable solutions in human intelligence, reasoning provides great potential for reinforcement learning (RL) agents’ generalization towards varied goals by summarizing part-to-whole arguments and discovering cause-and-effect relations. However, how to discover and represent causalities remains a huge gap that hinders the development of causal RL. In this paper, we augment Goal-Conditioned RL (GCRL) with Causal Graph (CG), a structure built upon the relation between objects and events. We novelly formulate the GCRL problem into variational likelihood maximization with CG as latent variables. To optimize the derived objective, we propose a framework with theoretical performance guarantees that alternates between two steps: using interventional data to estimate the posterior of CG; using CG to learn generalizable models and interpretable policies. Due to the lack of public benchmarks that verify generalization capability under reasoning, we design nine tasks and then empirically show the effectiveness of the proposed method against five baselines on these tasks. Further theoretical analysis shows that our performance improvement is attributed to the virtuous cycle of causal discovery, transition modeling, and policy training, which aligns with the experimental evidence in extensive ablation studies.

arxiv情報

著者	Wenhao Ding,Haohong Lin,Bo Li,Ding Zhao
発行日	2023-05-17 16:29:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー