MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft

要約

無限の可能性を秘めたオープンエンドのゲーム環境である Minecraft でオープンエンドのエージェントを作成するという目標を追求するために、この文書では、Minecraft エージェント評価用の MCU というタスク中心のフレームワークを紹介します。
MCU フレームワークは、基本的な構成要素としてアトムタスクの概念を活用し、多様なタスクや任意のタスクの生成を可能にします。
MCU フレームワーク内では、各タスクは 6 つの異なる難易度スコア (時間消費、運用労力、計画の複雑さ、複雑さ、創造性、新規性) で測定されます。
これらのスコアは、さまざまな角度からのタスクの多次元評価を提供するため、特定の側面におけるエージェントの能力を明らかにすることができます。
難易度スコアは各タスクの特徴としても機能し、意味のあるタスク空間を作成し、タスク間の関係を明らかにします。
MCU フレームワークを採用した Minecraft エージェントを効率的に評価するために、SkillForge という統一ベンチマークを維持しています。このベンチマークは、多様なカテゴリと難易度分布を持つ代表的なタスクで構成されています。
また、ユーザーがタスクを選択してエージェントの特定の能力を評価するための便利なフィルターも提供します。
私たちは、MCU が Minecraft エージェントに関する最近の文献で使用されているすべてのタスクをカバーできる高い表現力を備えていることを示し、オープンエンドの Minecraft という目標のもとで、創造性、正確な制御、配布外の一般化などの分野での進歩の必要性を強調しています。
エージェント開発。

要約(オリジナル)

To pursue the goal of creating an open-ended agent in Minecraft, an open-ended game environment with unlimited possibilities, this paper introduces a task-centric framework named MCU for Minecraft agent evaluation. The MCU framework leverages the concept of atom tasks as fundamental building blocks, enabling the generation of diverse or even arbitrary tasks. Within the MCU framework, each task is measured with six distinct difficulty scores (time consumption, operational effort, planning complexity, intricacy, creativity, novelty). These scores offer a multi-dimensional assessment of a task from different angles, and thus can reveal an agent’s capability on specific facets. The difficulty scores also serve as the feature of each task, which creates a meaningful task space and unveils the relationship between tasks. For efficient evaluation of Minecraft agents employing the MCU framework, we maintain a unified benchmark, namely SkillForge, which comprises representative tasks with diverse categories and difficulty distribution. We also provide convenient filters for users to select tasks to assess specific capabilities of agents. We show that MCU has the high expressivity to cover all tasks used in recent literature on Minecraft agent, and underscores the need for advancements in areas such as creativity, precise control, and out-of-distribution generalization under the goal of open-ended Minecraft agent development.

arxiv情報

著者	Haowei Lin,Zihao Wang,Jianzhu Ma,Yitao Liang
発行日	2023-10-12 14:38:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー