Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions

要約

強化学習 (RL) は制御問題において有望ですが、制約のある複雑な報酬関数から生じる複雑さによって実際の応用が妨げられることがよくあります。
報酬仮説は、これらの競合する要求を単一のスカラー報酬関数にカプセル化できることを示唆していますが、そのような関数の設計は依然として困難です。
既存の研究に基づいて、動的な障害物を伴う移動ロボット工学のアプリケーションにおける目標達成と制約満足度のバランスをとる現実的な報酬関数を導き出すために、軌道に対する好みを定式化することから始めます。
このような複雑な設定での報酬の搾取を軽減するために、私たちは、経験を適応的にサンプリングする柔軟なリプレイバッファーと組み合わせた、新しい 2 段階の報酬カリキュラムを提案します。
私たちのアプローチでは、完全な報酬に移行する前に、まず報酬のサブセットについて学習し、エージェントが目標と制約の間のトレードオフを学習できるようにします。
新しい段階に移行した後も、私たちの手法はサンプル効率的な学習に対する報酬を更新することで過去の経験を利用し続けます。
私たちはロボットナビゲーションタスクにおけるアプローチの有効性を調査し、真の報酬達成とタスク完了の点でベースラインと比較して優れたパフォーマンスを実証し、その有効性を強調しています。

要約(オリジナル)

Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints. While the reward hypothesis suggests these competing demands can be encapsulated in a single scalar reward function, designing such functions remains challenging. Building on existing work, we start by formulating preferences over trajectories to derive a realistic reward function that balances goal achievement with constraint satisfaction in the application of mobile robotics with dynamic obstacles. To mitigate reward exploitation in such complex settings, we propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences. Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints. After transitioning to a new stage, our method continues to make use of past experiences by updating their rewards for sample-efficient learning. We investigate the efficacy of our approach in robot navigation tasks and demonstrate superior performance compared to baselines in terms of true reward achievement and task completion, underlining its effectiveness.

arxiv情報

著者	Kilian Freitag,Kristian Ceder,Rita Laezza,Knut Åkesson,Morteza Haghir Chehreghani
発行日	2024-10-22 08:07:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー