Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

要約

ビデオゲームは意思決定コミュニティにとって有用なベンチマークとして機能しましたが、Atariゲームを超えて現代のゲームに向かうことは、研究コミュニティの大多数にとって非常に高価です。
最新のビデオゲームでの以前の作業は、通常、ゲーム固有の統合に依存して、ゲーム機能を取得し、オンライントレーニングを可能にするか、既存の大規模なデータセットに依存していました。
別のアプローチは、模倣学習を使用してエージェントを訓練して、画像から純粋にビデオゲームをプレイすることです。
ただし、この設定は基本的な疑問を提起します。意思決定に重要な情報を保持する表現を取得する視覚エンコーダーはどれですか？
この質問に答えるために、Minecraft、Counter-Strike：Global Offensive、Minecraft Dungeonsでの典型的なタスク固有のエンドツーエンドトレーニングアプローチと比較して、公開されている事前に訓練された視覚エンコーダを使用して、模倣学習の体系的な研究を実施します。
私たちの結果は、エンドツーエンドのトレーニングは、比較的低解像度の画像と数分のデモンストレーションで効果的であることを示していますが、ゲームに応じてDINOV2などの事前に訓練されたエンコーダーを利用することで大幅な改善が得られる可能性があります。
効果的な意思決定を可能にすることに加えて、事前に訓練されたエンコーダーが、トレーニングのコストを大幅に削減することにより、ビデオゲームの意思決定研究をよりアクセスしやすくすることができることを示します。

要約(オリジナル)

Video games have served as useful benchmarks for the decision-making community, but going beyond Atari games towards modern games has been prohibitively expensive for the vast majority of the research community. Prior work in modern video games typically relied on game-specific integration to obtain game features and enable online training, or on existing large datasets. An alternative approach is to train agents using imitation learning to play video games purely from images. However, this setting poses a fundamental question: which visual encoders obtain representations that retain information critical for decision making? To answer this question, we conduct a systematic study of imitation learning with publicly available pre-trained visual encoders compared to the typical task-specific end-to-end training approach in Minecraft, Counter-Strike: Global Offensive, and Minecraft Dungeons. Our results show that end-to-end training can be effective with comparably low-resolution images and only minutes of demonstrations, but significant improvements can be gained by utilising pre-trained encoders such as DINOv2 depending on the game. In addition to enabling effective decision making, we show that pre-trained encoders can make decision-making research in video games more accessible by significantly reducing the cost of training.

arxiv情報

著者	Lukas Schäfer,Logan Jones,Anssi Kanervisto,Yuhan Cao,Tabish Rashid,Raluca Georgescu,Dave Bignell,Siddhartha Sen,Andrea Treviño Gavito,Sam Devlin
発行日	2025-04-30 17:44:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー