Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning

要約

強化学習 (RL) エージェントは長い間、人間の学習の効率化を目指してきました。
人間は優れた観察者であり、タスクを試みる他の人の方針からの観察など、さまざまな情報源から外部の知識を集約することで学習できます。
RL におけるこれまでの研究では、エージェントがサンプル効率を向上させるために外部知識ポリシーが組み込まれていました。
ただし、これらのポリシーの任意の組み合わせや置換を実行することは依然として簡単ではありません。これは、一般化と移行可能性にとって不可欠な機能です。
この研究では、複数の知識ポリシーを融合し、人間のような効率性と柔軟性を目指す RL パラダイムである、Knowledge-Grounded RL (KGRL) を紹介します。
我々は、KGRL の新しいアクターアーキテクチャである知識包括的注意ネットワーク (KIAN) を提案します。これは、埋め込みベースの注意深いアクション予測により、自由な知識の再配置を可能にします。
KIAN は、ポリシー配布の新しい設計を通じて、エントロピーの不均衡、つまりエージェントが効率的に環境を探索することを妨げる最大エントロピー KGRL で生じる問題にも対処します。
実験結果は、KIAN が外部知識ポリシーを組み込んだ代替方法よりも優れたパフォーマンスを示し、効率的かつ柔軟な学習を実現することを示しています。
私たちの実装は https://github.com/Pascalson/KGRL.git で入手できます。

要約(オリジナル)

Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning. Humans are great observers who can learn by aggregating external knowledge from various sources, including observations from others’ policies of attempting a task. Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency. However, it remains non-trivial to perform arbitrary combinations and replacements of those policies, an essential feature for generalization and transferability. In this work, we present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility. We propose a new actor architecture for KGRL, Knowledge-Inclusive Attention Network (KIAN), which allows free knowledge rearrangement due to embedding-based attentive action prediction. KIAN also addresses entropy imbalance, a problem arising in maximum entropy KGRL that hinders an agent from efficiently exploring the environment, through a new design of policy distributions. The experimental results demonstrate that KIAN outperforms alternative methods incorporating external knowledge policies and achieves efficient and flexible learning. Our implementation is available at https://github.com/Pascalson/KGRL.git

arxiv情報

著者	Zih-Yun Chiu,Yi-Lin Tuan,William Yang Wang,Michael C. Yip
発行日	2023-10-09 18:17:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー