WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback

要約

大規模な言語モデル（LLMS）を搭載したWebエージェントは、次世代AIの約束を示していますが、不確実で動的なWeb環境での限られた推論は、堅牢な展開を妨げます。
このホワイトペーパーでは、効果的なWebエージェントに不可欠な重要な推論スキル、つまりリフレクションとLookahead、分岐、ロールバック、およびエージェントの（推論時間）推論アルゴリズムを考え方の順位に再構築することにより、これらの能力を例示するキュレート軌道データを特定します。
エージェントの自己改善ベンチマークであるOpenWebVoyagerで実験を実施し、単純な微調整を介して顕著な推論パターンをバックボーンLLMに蒸留することでパフォーマンスを大幅に向上させることを実証します。
私たちのアプローチは、WebVoyager、Mind2Web-Live、SimpleQA（Web検索）など、複数のベンチマークにわたって大幅な改善をもたらし、Webエージェントのターゲットを絞った推論スキル強化の可能性を強調しています。

要約(オリジナル)

Web agents powered by Large Language Models (LLMs) show promise for next-generation AI, but their limited reasoning in uncertain, dynamic web environments hinders robust deployment. In this paper, we identify key reasoning skills essential for effective web agents, i.e., reflection & lookahead, branching, and rollback, and curate trajectory data that exemplifies these abilities by reconstructing the agent’s (inference-time) reasoning algorithms into chain-of-thought rationales. We conduct experiments in the agent self-improving benchmark, OpenWebVoyager, and demonstrate that distilling salient reasoning patterns into the backbone LLM via simple fine-tuning can substantially enhance its performance. Our approach yields significant improvements across multiple benchmarks, including WebVoyager, Mind2web-live, and SimpleQA (web search), highlighting the potential of targeted reasoning skill enhancement for web agents.

arxiv情報

著者	Minda Hu,Tianqing Fang,Jianshu Zhang,Junyu Ma,Zhisong Zhang,Jingyan Zhou,Hongming Zhang,Haitao Mi,Dong Yu,Irwin King
発行日	2025-05-26 14:03:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー