Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning

要約

強化学習 (RL) では、環境の対称性を利用することで、効率、堅牢性、パフォーマンスを大幅に向上させることができます。
ただし、これらの対称性を利用するために、ディープ RL ポリシーとバリューネットワークがそれぞれ等変および不変であることを保証することは、大きな課題です。
関連する研究では、構築によって等変かつ不変であるネットワークを設計しようとしていますが、コンポーネントのライブラリが非常に制限されているため、ネットワークの表現力が妨げられています。
この論文では、特殊なニューラルネットワークコンポーネントを使用せずに等変ポリシーと不変値関数を構築する方法 (等変アンサンブルと呼ぶ) を提案します。
さらに、トレーニング中に帰納的バイアスを追加するための正則化項を追加します。
マップベースのパスプランニングのケーススタディでは、等変アンサンブルと正則化がサンプルの効率とパフォーマンスにどのようなメリットをもたらすかを示します。

要約(オリジナル)

In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to a very restricted library of components, which in turn hampers the expressiveness of the networks. This paper proposes a method to construct equivariant policies and invariant value functions without specialized neural network components, which we term equivariant ensembles. We further add a regularization term for adding inductive bias during training. In a map-based path planning case study, we show how equivariant ensembles and regularization benefit sample efficiency and performance.

arxiv情報

著者	Mirco Theile,Hongpeng Cao,Marco Caccamo,Alberto L. Sangiovanni-Vincentelli
発行日	2024-03-19 16:01:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー