FlowReasoner: Reinforcing Query-Level Meta-Agents

要約

このペーパーでは、クエリレベルのマルチエージェントシステムの設計、つまりユーザークエリごとの1つのシステムを自動化するためのクエリレベルのメタエージェントという名前のFlow-Reasonerを提案します。
私たちの核となるアイデアは、外部実行フィードバックを介して推論ベースのメタエージェントを奨励することです。
具体的には、DeepSeek R1を蒸留することにより、まず、マルチエージェントシステムの生成に関する基本的な推論能力をフローリアーズに捧げます。
次に、外部の実行フィードバックを使用して、強化学習（RL）を介してさらに強化します。
多目的報酬は、パフォーマンス、複雑さ、効率の側面からのRLトレーニングをガイドするために設計されています。
この方法で、Flow-Reasonerは、審議的推論を介して各ユーザークエリのパーソナライズされたマルチエージェントシステムを生成できるようになります。
エンジニアリングと競争コードの両方のベンチマークでの実験は、フローリアーズの優位性を示しています。
驚くべきことに、3つのベンチマークでO1-MINIを10.52％の精度を上回ります。
このコードは、https：//github.com/sail-sg/flowrisonerで入手できます。

要約(オリジナル)

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks. The code is available at https://github.com/sail-sg/FlowReasoner.

arxiv情報

著者	Hongcheng Gao,Yue Liu,Yufei He,Longxu Dou,Chao Du,Zhijie Deng,Bryan Hooi,Min Lin,Tianyu Pang
発行日	2025-04-21 17:35:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FlowReasoner: Reinforcing Query-Level Meta-Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー