Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

要約

強化学習（RL）アルゴリズムの開発と分析において、ベンチマークは重要な役割を果たす。我々は、オープンエンド学習の研究に使用される既存のベンチマークが、2つのカテゴリのいずれかに分類されることを確認している。Crafter、NetHack、Minecraftのように莫大な計算資源がないと遅すぎて有意義な研究ができないか、MinigridやProcgenのように重要な課題をもたらすほど複雑でないかである。この問題を解決するために、まずCraftax-Classicを紹介する。これは、Pythonネイティブのオリジナルよりも最大250倍高速に動作する、JAXでCrafterを書き直したものである。10億の環境相互作用を使用するPPOの実行は、単一のGPUを使用して1時間未満で終了し、最適な報酬の90％を平均します。より説得力のある課題を提供するために、NetHackからインスパイアされた要素でCrafterのメカニズムを大幅に拡張したCraftaxベンチマークを紹介します。Craftaxを解くには、深い探索、長期的な計画と記憶が必要であり、さらに世界が発見されるにつれて新しい状況に適応し続ける必要がある。我々は、大域的探索やエピソード探索、教師なし環境設計を含む既存の方法では、ベンチマークで重要な進歩を遂げることができないことを示す。我々は、Craftaxが研究者が限られた計算資源で複雑でオープンエンドな環境で実験することを初めて可能にすると信じている。

要約(オリジナル)

Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.

arxiv情報

著者	Michael Matthews,Michael Beukman,Benjamin Ellis,Mikayel Samvelyan,Matthew Jackson,Samuel Coward,Jakob Foerster
発行日	2024-06-03 14:12:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー