A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

要約

事前に訓練された大規模言語モデル（LLM）は、近年、自律的なウェブ自動化において、より優れた汎化とサンプル効率を達成している。しかし、実世界のWebサイトにおける性能は、(1)オープンドメインであること、(2)コンテキストの長さが限られていること、(3)HTMLに対する帰納的バイアスが欠けていること、に悩まされている。我々は、LLM駆動型エージェントであるWebAgentを紹介する。WebAgentは、自然言語の指示に従って実際のWebサイトでタスクを完了するために、自己経験から学習する。WebAgentは、命令を正規化されたサブ命令に分解することによって前もって計画を立て、長いHTML文書をタスクに関連するスニペットに要約し、それらから生成されたPythonプログラムによってWebサイト上で動作する。我々は、WebAgentを、接地されたコード生成のためのFlan-U-PaLMと、計画と要約のための、局所的および大域的な注意機構とロングスパンのノイズ除去目的の混合を使用する、長いHTML文書のための新しい事前訓練されたLLMであるHTML-T5を用いて設計する。また、HTML-T5が様々なHTML理解タスクを解くのに最適なモデルであることを実証的に示す。MiniWoBウェブ自動化ベンチマークでは先行手法よりも18.7%高い成功率を達成し、オフラインタスク計画評価であるMind2WebではSoTAのパフォーマンスを達成した。

要約(オリジナル)

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.

arxiv情報

著者	Izzeddin Gur,Hiroki Furuta,Austin Huang,Mustafa Safdari,Yutaka Matsuo,Douglas Eck,Aleksandra Faust
発行日	2023-10-03 03:51:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー