DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

要約

オンデバイス制御エージェント (特にモバイルデバイス上) は、モバイルデバイスを操作してユーザーの要求を満たす責任を負い、シームレスで直感的な対話を可能にします。
マルチモーダル大規模言語モデル (MLLM) をこれらのエージェントに統合すると、複雑なコマンドを理解して実行する能力が強化され、ユーザーエクスペリエンスが向上します。
ただし、オンデバイス制御用に MLLM を微調整するには、利用可能なデータが限られており、オンライントレーニングプロセスが非効率であるため、大きな課題が生じます。
このペーパーでは、モバイルデバイスコントロールエージェントのオンライン RL 微調整の効率を高めるために設計された新しいフレームワークである DistRL を紹介します。
DistRL は、集中トレーニングと分散データ取得を採用し、動的なオンラインインタラクションのコンテキストで効率的な微調整を保証します。
さらに、このフレームワークはカスタマイズされた RL アルゴリズムによって支えられており、探索と収集されたデータの優先利用のバランスを効果的に調整して、安定した堅牢なトレーニングを保証します。
私たちの実験によると、DistRL は、主要な同期マルチマシン手法と比べて、平均してトレーニング効率が 3 倍向上し、トレーニングデータの収集が 2.4 倍高速になることがわかりました。
特に、トレーニング後、DistRL はオープンベンチマークからの一般的な Android タスクで最先端の手法と比較して成功率が 20% 相対的に向上しており、同じトレーニング時間を維持しながら既存のアプローチを大幅に上回っています。
これらの結果は、DistRL がスケーラブルで効率的なソリューションであることを検証し、現実世界のデバイス制御タスクのトレーニング効率とエージェントのパフォーマンスの両方に大幅な向上をもたらします。

要約(オリジナル)

On-device control agents, especially on mobile devices, are responsible for operating mobile devices to fulfill users’ requests, enabling seamless and intuitive interactions. Integrating Multimodal Large Language Models (MLLMs) into these agents enhances their ability to understand and execute complex commands, thereby improving user experience. However, fine-tuning MLLMs for on-device control presents significant challenges due to limited data availability and inefficient online training processes. This paper introduces DistRL, a novel framework designed to enhance the efficiency of online RL fine-tuning for mobile device control agents. DistRL employs centralized training and decentralized data acquisition to ensure efficient fine-tuning in the context of dynamic online interactions. Additionally, the framework is backed by our tailor-made RL algorithm, which effectively balances exploration with the prioritized utilization of collected data to ensure stable and robust training. Our experiments show that, on average, DistRL delivers a 3X improvement in training efficiency and enables training data collection 2.4X faster than the leading synchronous multi-machine methods. Notably, after training, DistRL achieves a 20% relative improvement in success rate compared to state-of-the-art methods on general Android tasks from an open benchmark, significantly outperforming existing approaches while maintaining the same training time. These results validate DistRL as a scalable and efficient solution, offering substantial improvements in both training efficiency and agent performance for real-world, in-the-wild device control tasks.

arxiv情報

著者	Taiyi Wang,Zhihao Wu,Jianheng Liu,Jianye Hao,Jun Wang,Kun Shao
発行日	2024-11-12 14:57:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー