Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks

要約

このレポートでは、IROS 2024 の AI オリンピック競技会のために開発された、アクロロボットとペンデュボットのスイングアップと安定化タスクのソリューションを紹介します。私たちのアプローチは、平均報酬エントロピーアドバンテージポリシー最適化 (AR-EAPO) モデルを採用しています。
平均報酬 RL と最大エントロピー RL を組み合わせた自由強化学習 (RL) アルゴリズム。
結果は、当社のコントローラーが、高度に設計された報酬関数やシステムモデルを必要とせずに、アクロボットシナリオとペンデュボットシナリオの両方で確立されたベースライン手法と比較して、パフォーマンスと堅牢性のスコアが向上していることを示しています。
現在の結果は、シミュレーションステージのセットアップにのみ適用されます。

要約(オリジナル)

This report presents a solution for the swing-up and stabilisation tasks of the acrobot and the pendubot, developed for the AI Olympics competition at IROS 2024. Our approach employs the Average-Reward Entropy Advantage Policy Optimization (AR-EAPO), a model-free reinforcement learning (RL) algorithm that combines average-reward RL and maximum entropy RL. Results demonstrate that our controller achieves improved performance and robustness scores compared to established baseline methods in both the acrobot and pendubot scenarios, without the need for a heavily engineered reward function or system model. The current results are applicable exclusively to the simulation stage setup.

arxiv情報

著者	Jean Seong Bjorn Choe,Bumkyu Choi,Jong-kook Kim
発行日	2024-09-13 15:56:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー