HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

要約

大規模言語モデル (LLM) はソフトウェアエンジニアリング (SE) に革命をもたらし、さまざまなコーディングタスクにおける顕著な熟練度を示します。
最近の進歩により、エンドツーエンドの開発タスクに LLM を利用した自律型ソフトウェアエージェントの作成が可能になりましたが、これらのシステムは通常、特定の SE 機能向けに設計されています。
人間の開発者のワークフローを模倣することで、さまざまなプログラミング言語にわたる幅広い SE タスクに取り組むように設計された革新的なジェネラリストマルチエージェントシステムである HyperAgent を紹介します。
HyperAgent は、初期計画から最終検証まで、SE タスクのライフサイクル全体を処理できる、プランナー、ナビゲーター、コードエディター、エグゼキューターの 4 つの専門エージェントを備えています。
HyperAgent は、有名な SWE-Bench ベンチマークでの GitHub の問題解決など、さまざまな SE タスクで新しいベンチマークを設定し、堅牢なベースラインを上回ります。
さらに、HyperAgent は、リポジトリレベルのコード生成 (RepoExec) および障害位置特定とプログラム修復 (Defects4J) において優れたパフォーマンスを示し、多くの場合、最先端のベースラインを上回ります。

要約(オリジナル)

Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent system designed to tackle a wide range of SE tasks across different programming languages by mimicking the workflows of human developers. HyperAgent features four specialized agents-Planner, Navigator, Code Editor, and Executor-capable of handling the entire lifecycle of SE tasks, from initial planning to final verification. HyperAgent sets new benchmarks in diverse SE tasks, including GitHub issue resolution on the renowned SWE-Bench benchmark, outperforming robust baselines. Furthermore, HyperAgent demonstrates exceptional performance in repository-level code generation (RepoExec) and fault localization and program repair (Defects4J), often surpassing state-of-the-art baselines.

arxiv情報

著者	Huy Nhat Phan,Tien N. Nguyen,Phong X. Nguyen,Nghi D. Q. Bui
発行日	2024-11-05 17:22:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー