Revisiting Discrete Soft Actor-Critic

要約

私たちは、最先端の強化学習 (RL) アルゴリズムと考えられている Soft Actor-Critic (SAC) の連続アクション空間から離散アクション空間への適応を研究します。
バニラのディスクリート SAC を再検討し、ディスクリート設定に適用した場合の Q 値の過小評価とパフォーマンスの不安定性の問題について深く理解します。
そこで、我々は、これらの問題に対処するために、エントロピーペナルティと Q クリップによる二重平均 Q 学習を活用するアルゴリズムである安定離散 SAC (SDSAC) を提案します。
Atari ゲームや大規模 MOBA ゲームなど、離散アクション空間を使用した典型的なベンチマークでの広範な実験により、提案した手法の有効性が示されています。
コードは https://github.com/coldSummerday/SD-SAC.git にあります。

要約(オリジナル)

We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space. We revisit vanilla discrete SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at: https://github.com/coldsummerday/SD-SAC.git.

arxiv情報

著者	Haibin Zhou,Tong Wei,Zichuan Lin,junyou li,Junliang Xing,Yuanchun Shi,Li Shen,Chao Yu,Deheng Ye
発行日	2024-11-20 13:52:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Revisiting Discrete Soft Actor-Critic

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー