NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild


\ texttt {llama-3.1-8b} finetuned 10k nnetnav自己生成デモンストレーションでは、Webarenaで16 \%の成功率、Webvoyagerで35%を超える成功率が得られ、15ptsと31ptの改善がそれぞれZero-shot \ textt {llama- {llama- {llama-


We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agents. Given any website, NNetNav produces these demonstrations by retroactively labeling action sequences from an exploration policy. Most work on training browser agents has relied on expensive human supervision, and the limited prior work on such interaction-based techniques has failed to provide effective search through the exponentially large space of exploration. In contrast, NNetNav exploits the hierarchical structure of language instructions to make this search more tractable: Complex instructions are typically decomposable into simpler sub-tasks, allowing NNetNav to automatically prune interaction episodes when an intermediate trajectory cannot be annotated with a meaningful sub-task. \texttt{LLama-3.1-8b} finetuned on 10k NNetNav self-generated demonstrations obtains over 16\% success rate on WebArena, and 35\% on WebVoyager, an improvement of 15pts and 31pts respectively over zero-shot \texttt{LLama-3.1-8b}, outperforming zero-shot GPT-4 and reaching the state-of-the-art among unsupervised methods, for both benchmarks.


著者 Shikhar Murty,Hao Zhu,Dzmitry Bahdanau,Christopher D. Manning
発行日 2025-02-05 18:56:51+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL パーマリンク