Soft Actor-Critic Algorithm with Truly Inequality Constraint

要約

強化学習におけるソフトアクタークリティック (SAC) は、次世代のロボット制御スキームの 1 つとして期待されています。
ポリシーエントロピーを最大化する機能により、ロボットコントローラーはノイズや摂動に対して堅牢になり、現実世界のロボットアプリケーションに役立ちます。
ただし、ポリシーエントロピーを最大化する優先順位は、現在の実装では自動的に調整されます。そのルールは、ポリシーエントロピーを指定されたターゲット値にバインドする等式制約のルールとして解釈できます。
したがって、現在の SAC は、予想に反して、もはやポリシーエントロピーを最大化していません。
SAC でこの問題を解決するために、このホワイトペーパーでは、不等式制約を適切に処理してポリシーエントロピーを最大化するためのスラック変数を使用して実装を改善します。
Mujoco および Pybullet シミュレーターでは、修正された SAC は、行動規範を正則化しながら、以前よりも高いロバスト性と安定した学習を実現しました。
さらに、現実世界のロボット制御への修正された SAC の適用性を示すために、実際のロボット可変インピーダンスタスクが実証されました。

要約(オリジナル)

Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful for real-world robot applications. However, the priority of maximizing the policy entropy is automatically tuned in the current implementation, the rule of which can be interpreted as one for equality constraint, binding the policy entropy into its specified target value. The current SAC is therefore no longer maximize the policy entropy, contrary to our expectation. To resolve this issue in SAC, this paper improves its implementation with a slack variable for appropriately handling the inequality constraint to maximize the policy entropy. In Mujoco and Pybullet simulators, the modified SAC achieved the higher robustness and the more stable learning than before while regularizing the norm of action. In addition, a real-robot variable impedance task was demonstrated for showing the applicability of the modified SAC to real-world robot control.

arxiv情報

著者	Taisuke Kobayashi
発行日	2023-03-08 03:32:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Soft Actor-Critic Algorithm with Truly Inequality Constraint

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー