Adversarial Environment Design via Regret-Guided Diffusion Models

要約

環境変化に強いエージェントをトレーニングすることは、深層強化学習 (RL) において依然として大きな課題です。
教師なし環境設計 (UED) は、エージェントの能力に合わせた一連のトレーニング環境を生成することでこの問題に対処するために最近登場しました。
これまでの研究では、UED が堅牢なポリシーを学習する可能性があることを示していますが、そのパフォーマンスは環境生成の機能によって制限されます。
この目的を達成するために、我々は新しい UED アルゴリズム、リグレス誘導拡散モデル (ADD) による敵対的環境設計を提案します。
提案された方法は、エージェントの後悔を考慮して拡散ベースの環境ジェネレーターをガイドし、エージェントが困難であると感じるがさらなる改善に役立つ環境を生成します。
拡散モデルの表現力を活用することで、ADD はトレーニング環境の多様性を維持しながら敵対的環境を直接生成でき、エージェントが効果的に堅牢なポリシーを学習できるようになります。
私たちの実験結果は、提案された方法が環境の有益なカリキュラムを生成することに成功し、新しい分散外環境全体にわたるゼロショット汎化において UED ベースラインを上回るパフォーマンスを示していることを示しています。
プロジェクトページ: https://github.com/rllab-snu.github.io/projects/ADD

要約(オリジナル)

Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL). Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent’s capabilities. While prior works demonstrate that UED has the potential to learn a robust policy, their performance is constrained by the capabilities of the environment generation. To this end, we propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD). The proposed method guides the diffusion-based environment generator with the regret of the agent to produce environments that the agent finds challenging but conducive to further improvement. By exploiting the representation power of diffusion models, ADD can directly generate adversarial environments while maintaining the diversity of training environments, enabling the agent to effectively learn a robust policy. Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments. Project page: https://github.com/rllab-snu.github.io/projects/ADD

arxiv情報

著者	Hojun Chung,Junseo Lee,Minsoo Kim,Dohyeong Kim,Songhwai Oh
発行日	2024-10-25 17:35:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adversarial Environment Design via Regret-Guided Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー