ProgRM: Build Better GUI Agents with Progress Rewards

要約

LLMベースの（大手言語モデル）GUI（グラフィカルユーザーインターフェイス）エージェントは、私たちの日常生活を大幅に変える可能性があります。
ただし、現在のLLMベースのGUIエージェントは、軌跡の収集と報酬注釈が困難なため、高品質のトレーニングデータの希少性に苦しんでいます。
既存の作品は、模倣学習のための軌跡を収集したり、オンラインRLトレーニングに報酬信号を提供するためにLLMを調査しています。
ただし、既存の作業で使用される結果報酬モデル（ORM）は、細かいフィードバックを提供することができず、最終的に失敗した軌跡の貴重なステップを過剰にペナ化することができます。
この目的のために、オンライントレーニングの各ステップのタスク完了の進行状況を予測することにより、進行状況報酬モデル（Progrm）を提案します。
Progress Reward Label Annotationの課題を処理するために、さらに効率的なLCSベースの（最も長い共通サブシーケンス）自己発言アルゴリズムを設計して、軌跡の重要な手順を発見し、それに応じて進行状況ラベルを割り当てます。
Progrmは、広範な実験と分析で評価されます。
Progrmで訓練された俳優は、Progrmの有効性を示す、主要な独自のLLMSとORM訓練を受けた俳優よりも優れています。
実験のコードは、受け入れたときに公開されます。

要約(オリジナル)

LLM-based (Large Language Model) GUI (Graphical User Interface) agents can potentially reshape our daily lives significantly. However, current LLM-based GUI agents suffer from the scarcity of high-quality training data owing to the difficulties of trajectory collection and reward annotation. Existing works have been exploring LLMs to collect trajectories for imitation learning or to offer reward signals for online RL training. However, the Outcome Reward Model (ORM) used in existing works cannot provide finegrained feedback and can over-penalize the valuable steps in finally failed trajectories. To this end, we propose Progress Reward Model (ProgRM) to provide dense informative intermediate rewards by predicting a task completion progress for each step in online training. To handle the challenge of progress reward label annotation, we further design an efficient LCS-based (Longest Common Subsequence) self-annotation algorithm to discover the key steps in trajectories and assign progress labels accordingly. ProgRM is evaluated with extensive experiments and analyses. Actors trained with ProgRM outperform leading proprietary LLMs and ORM-trained actors, illustrating the effectiveness of ProgRM. The codes for experiments will be made publicly available upon acceptance.

arxiv情報

著者	Danyang Zhang,Situo Zhang,Ziyue Yang,Zichen Zhu,Zihan Zhao,Ruisheng Cao,Lu Chen,Kai Yu
発行日	2025-05-23 17:23:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ProgRM: Build Better GUI Agents with Progress Rewards

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー