Scaling Laws for Imitation Learning in Single-Agent Games

要約

模倣学習 (IL) は、機械学習で最も広く使用されている手法の 1 つです。
しかし、多くの作品では、シングルエージェントゲームのような制約のある環境であっても、基礎となるエキスパートの動作を完全に回復できないことが多いことがわかっています。
ただし、これらの研究はいずれも、モデルとデータサイズのスケールアップの役割を深く調査していません。
「スケールアップ」により LLM の能力がますます向上した自然言語処理 (NLP) の最近の研究に触発され、モデルとデータサイズを慎重にスケールアップすることで、シングルエージェントゲームの模倣学習設定にも同様の改善がもたらされるかどうかを調査します。
まず、さまざまな Atari ゲームに関する調査結果を実証し、その後、NetHack の非常に挑戦的なゲームに焦点を当てます。
すべてのゲームにおいて、IL 損失と平均リターンはコンピューティングバジェット (FLOP) に応じてスムーズにスケールし、強い相関関係があり、その結果、コンピューティング最適化 IL エージェントをトレーニングするためのべき乗則が得られることがわかりました。
最後に、IL を使用していくつかの NetHack エージェントを予測およびトレーニングしたところ、すべての設定で以前の最先端のものよりも 1.5 倍優れたパフォーマンスを発揮することがわかりました。
私たちの研究は、さまざまなシングルエージェントゲームにおける模倣学習のスケーリング動作と、現在の AI システムでは依然としてとらえどころのないゲームである NetHack における、ますます有能なエージェント向けに現在のアプローチをスケールアップする可能性を実証しています。

要約(オリジナル)

Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where ‘scaling up’ has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws for training compute-optimal IL agents. Finally, we forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by 1.5x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a variety of single-agent games, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.

arxiv情報

著者	Jens Tuyls,Dhruv Madeka,Kari Torkkola,Dean Foster,Karthik Narasimhan,Sham Kakade
発行日	2024-12-19 15:10:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scaling Laws for Imitation Learning in Single-Agent Games

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー