PrivilegedDreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies

要約

多くの現実世界のコントロールの問題には、自律運転からロボットへの操作に至るまで、観測不能な隠されたパラメーターの影響を受けるダイナミクスと目的が含まれます。
これらの種類のドメインを表すために、隠された変数が遷移関数と報酬機能をパラメーター化する順次決定問題をモデル化する隠されたパラメーターマルコフ決定プロセス（HIP-MDP）を採用します。
ドメインランダム化、ドメイン適応、メタラーニングなどの既存のアプローチは、隠されたパラメーターの効果を追加の分散として単純に扱い、特に報酬が隠された変数によってパラメーター化されている場合、HIP-MDPの問題を効果的に処理するのに苦労することが多い
。
明示的なパラメーター推定モジュールを組み込むことにより、既存のモデルベースのアプローチを拡張するモデルベースの強化学習フレームワークである特権Dreamerを紹介します。
特権的なDreamerは、限られた履歴データから隠されたパラメーターを明示的に推定し、これらの推定パラメーターでモデル、アクター、および批評家ネットワークを条件付けることができる新しいデュアルリカレントアーキテクチャを備えています。
5つの多様なHIP-MDPタスクに関する経験的分析は、特権的なドレアマーが最新のモデルベース、モデル、および主要な適応学習アルゴリズムよりも優れていることを示しています。
さらに、提案されたアーキテクチャに各コンポーネントを含めることを正当化するためにアブレーション研究を実施します。

要約(オリジナル)

Numerous real-world control problems involve dynamics and objectives affected by unobservable hidden pa- rameters, ranging from autonomous driving to robotic manipu- lation, which cause performance degradation during sim-to-real transfer. To represent these kinds of domains, we adopt hidden- parameter Markov decision processes (HIP-MDPs), which model sequential decision problems where hidden variables parameterize transition and reward functions. Existing ap- proaches, such as domain randomization, domain adaptation, and meta-learning, simply treat the effect of hidden param- eters as additional variance and often struggle to effectively handle HIP-MDP problems, especially when the rewards are parameterized by hidden variables. We introduce Privileged- Dreamer, a model-based reinforcement learning framework that extends the existing model-based approach by incorporating an explicit parameter estimation module. PrivilegedDreamer features its novel dual recurrent architecture that explicitly estimates hidden parameters from limited historical data and enables us to condition the model, actor, and critic networks on these estimated parameters. Our empirical analysis on five diverse HIP-MDP tasks demonstrates that PrivilegedDreamer outperforms state-of-the-art model-based, model-free, and do- main adaptation learning algorithms. Additionally, we conduct ablation studies to justify the inclusion of each component in the proposed architecture.

arxiv情報

著者	Morgan Byrd,Jackson Crandell,Mili Das,Jessica Inman,Robert Wright,Sehoon Ha
発行日	2025-02-17 02:46:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PrivilegedDreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー