Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction

要約

ジェネラリストエージェントの開発は、人工知能における長年の目標です。
さまざまなタスクからの広範なオフラインデータセットを利用したこれまでの取り組みは、強化学習内のマルチタスクシナリオで顕著なパフォーマンスを示しました。しかし、これらの取り組みでは、その機能を新しいタスクに拡張するという点で課題に直面しています。最近のアプローチでは、テキストによるガイダンスまたは視覚的な軌跡を意思決定ネットワークに統合して、タスク固有のコンテキストを提供しています。
ただし、タスクのコンテキスト情報を正確に伝えるには、テキストによるガイダンスや視覚的な軌跡だけでは不十分であることが観察されています。この論文では、エージェントがゲームプレイの指示を理解できるようにする、強化された形式のタスクガイダンスについて調査します。
ビジュアルタスクにおけるマルチモーダル命令チューニングの成功からインスピレーションを得て、ビジュアルベースのRLタスクを長期ビジョンタスクとして扱い、組み込むためのマルチモーダルゲーム命令のセットを構築します。
実験結果は、マルチモーダルゲーム命令を組み込むと、意思決定トランスフォーマーのマルチタスク機能と一般化機能が大幅に強化されることを示しています。

要約(オリジナル)

Developing a generalist agent is a longstanding objective in artificial intelligence. Previous efforts utilizing extensive offline datasets from various tasks demonstrate remarkable performance in multitasking scenarios within Reinforcement Learning.However, these works encounter challenges in extending their capabilities to new tasks.Recent approaches integrate textual guidance or visual trajectory into decision networks to provide task-specific contextual cues, representing a promising direction.However, it is observed that relying solely on textual guidance or visual trajectory is insufficient for accurately conveying the contextual information of tasks.This paper explores enhanced forms of task guidance for agents, enabling them to comprehend gameplay instructions, thereby facilitating a ‘read-to-play’ capability.Drawing inspiration from the success of multimodal instruction tuning in visual tasks, we treat the visual-based RL task as a long-horizon vision task and construct a set of multimodal game instructions to incorporate instruction tuning into a decision transformer.Experimental results demonstrate that incorporating multimodal game instructions significantly enhances the decision transformer’s multitasking and generalization capabilities.

arxiv情報

著者	Yonggang Jin,Ge Zhang,Hao Zhao,Tianyu Zheng,Jiawei Guo,Liuyu Xiang,Shawn Yue,Stephen W. Huang,Wenhu Chen,Zhaofeng He,Jie Fu
発行日	2024-02-06 17:09:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー