Domain Randomization via Entropy Maximization

要約

シミュレーションでダイナミクスパラメーターを変化させることは、強化学習 (RL) における現実のギャップを克服するための一般的なドメインランダム化 (DR) アプローチです。
それにもかかわらず、エージェントの動作を規則化するには高い変動性が重要ですが、過度にランダム化すると過度に保守的なポリシーにつながることで悪名高いため、DR はダイナミクスパラメーターのサンプリング分布の選択に大きく依存します。
この論文では、実世界のデータを必要とせずに、シミュレーションでのトレーニング中にダイナミクス分布を自動的に形成する、シミュレーションからリアルへの変換に対処するための新しいアプローチを提案します。
私たちは、一般化機能を保持しながらトレーニング分布のエントロピーを直接最大化する制約付き最適化問題である DOmain RAndomization via Entropy MaximizatiON (DORAEMON) を導入します。
これを達成するために、ドラえもんは、現在のポリシーの成功確率が十分に高い限り、サンプリングされたダイナミクスパラメータの多様性を徐々に増やします。
私たちは、DR 文献からの代表的なベースラインとは対照的に、高度に適応性があり一般化可能なポリシーを取得する、つまり、最も広範囲のダイナミクスパラメーターにわたって当面のタスクを解決するという、DORAEMON の一貫した利点を経験的に検証します。
特に、未知の現実世界のパラメータの下でのロボット操作セットアップでのゼロショット転送の成功を通じて、DORAEMON の Sim2Real の適用性も実証しました。

要約(オリジナル)

Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent’s behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.

arxiv情報

著者	Gabriele Tiboni,Pascal Klink,Jan Peters,Tatiana Tommasi,Carlo D’Eramo,Georgia Chalvatzaki
発行日	2024-03-26 12:59:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Domain Randomization via Entropy Maximization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー