TGRL: An Algorithm for Teacher Guided Reinforcement Learning

要約

報酬からの学習 (つまり、強化学習または RL) と教師を模倣する学習 (つまり、教師と生徒の学習) は、逐次的な意思決定の問題を解決するための 2 つの確立されたアプローチです。
これらのさまざまな形式の学習の利点を組み合わせるには、強化と教師と生徒の学習目標の組み合わせを最大化するポリシーをトレーニングするのが一般的です。
ただし、これらの目的のバランスを取るための原則的な方法がなかったため、以前の研究ではヒューリスティックと問題固有のハイパーパラメータ検索を使用して 2 つの目的のバランスをとっていました。
$\textit{原則}$ アプローチと、教師に従うタイミングと報酬を使用するタイミングのバランスをとる $\textit{動的}$ および $\textit{自動的}$ のおおよその実装を紹介します。
主なアイデアは、エージェントのパフォーマンスを、教師の監督なしで報酬のみから学習するエージェントの反事実シナリオと比較することによって、教師の監督の重要性を調整することです。
教師の監督を使用することでパフォーマンスが向上する場合は、教師の監督の重要性が高まり、そうでない場合は低下します。
私たちの手法である $\textit{教師主導型強化学習}$ (TGRL) は、ハイパーパラメーター調整を行わなくても、さまざまな領域にわたって強力なベースラインを上回ります。

要約(オリジナル)

Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $\textit{principled}$ approach, along with an approximate implementation for $\textit{dynamically}$ and $\textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent’s performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $\textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.

arxiv情報

著者	Idan Shenfeld,Zhang-Wei Hong,Aviv Tamar,Pulkit Agrawal
発行日	2023-07-06 17:58:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TGRL: An Algorithm for Teacher Guided Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー