Efficient World Models with Context-Aware Tokenization

要約

深い強化学習 (RL) 手法をスケールアップするには、大きな課題が伴います。
生成モデリングの発展に続き、モデルベースの RL が有力な候補としての地位を確立しています。
シーケンスモデリングの最近の進歩により、環境を正確にシミュレートするには長いトークンシーケンスが必要なため、大量の計算が必要になりますが、効果的なトランスフォーマーベースのワールドモデルが実現しました。
この研究では、タイムステップ間の確率的デルタをエンコードする離散オートエンコーダと、世界の現在の状態を要約して将来のデルタを予測する自己回帰トランスフォーマで構成される世界モデルアーキテクチャを持つ新しいエージェントである $\Delta$-IRIS を提案します。
連続トークン。
Crafter ベンチマークでは、$\Delta$-IRIS は複数のフレームバジェットで新しい最先端を確立しながら、以前のアテンションベースのアプローチよりもトレーニングが桁違いに高速です。
コードとモデルは https://github.com/vmicheli/delta-iris でリリースされています。

要約(オリジナル)

Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environments. In this work, we propose $\Delta$-IRIS, a new agent with a world model architecture composed of a discrete autoencoder that encodes stochastic deltas between time steps and an autoregressive transformer that predicts future deltas by summarizing the current state of the world with continuous tokens. In the Crafter benchmark, $\Delta$-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches. We release our code and models at https://github.com/vmicheli/delta-iris.

arxiv情報

著者	Vincent Micheli,Eloi Alonso,François Fleuret
発行日	2024-06-27 16:54:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient World Models with Context-Aware Tokenization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー