GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

要約

スマートフォンユーザーは、ソーシャルメディアプラットフォーム間でのコンテンツの共有などのタスクを完了するために、複数のアプリケーション (アプリ) 間を移動することがよくあります。
自律型グラフィカルユーザーインターフェイス (GUI) ナビゲーションエージェントは、ワークフローを合理化し、手動介入を減らすことで、コミュニケーション、エンターテイメント、生産性におけるユーザーエクスペリエンスを向上させることができます。
ただし、以前の GUI エージェントは、単一のアプリ内で完了できる単純なタスクを含むデータセットを使用してトレーニングされることが多く、アプリ間ナビゲーションのパフォーマンスが低下していました。
この問題に対処するために、クロスアプリナビゲーションエージェントのトレーニングと評価のための包括的なデータセットである GUI Odyssey を導入します。
GUI Odyssey は、6 つのモバイルデバイスからの 7,735 のエピソードで構成されており、6 種類のクロスアプリタスク、201 のアプリ、および 1.4K のアプリコンボに及びます。
GUI Odyssey を活用して、履歴リサンプリングモジュールで Qwen-VL モデルを微調整することにより、マルチモーダルクロスアプリナビゲーションエージェントである OdysseyAgent を開発しました。
広範な実験により、既存のモデルと比較して OdysseyAgent の精度が優れていることが実証されています。
たとえば、OdysseyAgent は、微調整された Qwen-VL とゼロショット GPT-4V を、平均でドメイン内精度で 1.44\% と 55.49\%、ドメイン外精度で 2.29\% と 48.14\% 上回りました。
データセットとコードは \url{https://github.com/OpenGVLab/GUI-Odyssey} でリリースされます。

要約(オリジナル)

Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms. Autonomous Graphical User Interface (GUI) navigation agents can enhance user experience in communication, entertainment, and productivity by streamlining workflows and reducing manual intervention. However, prior GUI agents often trained with datasets comprising simple tasks that can be completed within a single app, leading to poor performance in cross-app navigation. To address this problem, we introduce GUI Odyssey, a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos. Leveraging GUI Odyssey, we developed OdysseyAgent, a multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a history resampling module. Extensive experiments demonstrate OdysseyAgent’s superior accuracy compared to existing models. For instance, OdysseyAgent surpasses fine-tuned Qwen-VL and zero-shot GPT-4V by 1.44\% and 55.49\% in-domain accuracy, and 2.29\% and 48.14\% out-of-domain accuracy on average. The dataset and code will be released in \url{https://github.com/OpenGVLab/GUI-Odyssey}.

arxiv情報

著者	Quanfeng Lu,Wenqi Shao,Zitao Liu,Fanqing Meng,Boxuan Li,Botong Chen,Siyuan Huang,Kaipeng Zhang,Yu Qiao,Ping Luo
発行日	2024-06-12 17:44:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー