AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

要約

自律エージェントは、現実世界と対話するためにますます重要になっています。
特に Android エージェントは、最近頻繁に言及される対話方法です。
ただし、Android エージェントのトレーニングと評価に関する既存の研究には、オープンソースモデルとクローズドソースモデルの両方に関する体系的な研究が不足しています。
この研究では、体系的な Android エージェントフレームワークとして AndroidLab を提案します。
これには、さまざまなモダリティを備えた操作環境、アクションスペース、再現可能なベンチマークが含まれます。
同じアクション空間で大規模言語モデル (LLM) とマルチモーダルモデル (LMM) の両方をサポートします。
AndroidLab ベンチマークには、事前定義された Android 仮想デバイスと、これらのデバイス上に構築された 9 つのアプリにわたる 138 のタスクが含まれています。
AndroidLab 環境を使用することで、Android 命令データセットを開発し、6 つのオープンソース LLM と LMM をトレーニングし、LLM の平均成功率を 4.59\% から 21.50\% に、LMM の平均成功率を 1.93\% から 13.28\% に引き上げました。
AndroidLab はオープンソースであり、\url{https://github.com/THUDM/Android-Lab} で公開されています。

要約(オリジナル)

Autonomous agents have become increasingly important for interacting with the real world. Android agents, in particular, have been recently a frequently-mentioned interaction method. However, existing studies for training and evaluating Android agents lack systematic research on both open-source and closed-source models. In this work, we propose AndroidLab as a systematic Android agent framework. It includes an operation environment with different modalities, action space, and a reproducible benchmark. It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space. AndroidLab benchmark includes predefined Android virtual devices and 138 tasks across nine apps built on these devices. By using the AndroidLab environment, we develop an Android Instruction dataset and train six open-source LLMs and LMMs, lifting the average success rates from 4.59\% to 21.50\% for LLMs and from 1.93\% to 13.28\% for LMMs. AndroidLab is open-sourced and publicly available at \url{https://github.com/THUDM/Android-Lab}.

arxiv情報

著者	Yifan Xu,Xiao Liu,Xueqiao Sun,Siyi Cheng,Hao Yu,Hanyu Lai,Shudan Zhang,Dan Zhang,Jie Tang,Yuxiao Dong
発行日	2024-10-31 15:25:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー