AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

要約

自然言語処理 (NLP) の進歩は、トランスフォーマーベースの大規模言語モデル (LLM) の開発によって大幅に促進されました。
これらのモデルは、特にコード生成において NLP タスクに革命をもたらし、開発者による効率の高いソフトウェアの作成を支援します。
これらの進歩にもかかわらず、コードスニペットの生成と効果的なテストケースの生成および実行のバランスをとるという課題は依然として残っています。
これらの問題に対処するために、この文書では、プログラマエージェント、テスト設計者エージェント、およびテスト実行者エージェントという特殊なエージェントを備えたマルチエージェントフレームワークで構成される新しいソリューションであるマルチエージェントアシスタントコード生成 (AgentCoder) を紹介します。
コーディング手順中、プログラマエージェントは、テスト実行エージェントのフィードバックに基づいてコードの生成と改良に集中します。
テスト設計エージェントは生成されたコードのテストケースを生成し、テスト実行エージェントはテストケースを使用してコードを実行し、プログラマにフィードバックを書き込みます。
この協調システムにより、単一エージェントモデルや従来の方法論の制限を超え、堅牢なコード生成が保証されます。
9 つのコード生成モデルと 12 の拡張アプローチに関する広範な実験により、既存のコード生成モデルを超える AgentCoder の優れたパフォーマンスと、さまざまなベンチマークにわたる迅速なエンジニアリング技術が実証されました。
たとえば、AgentCoder (GPT-4) は、HumanEval および MBPP データセットで 96.3\% および 91.8\% pass@1 を達成し、全体のトークンオーバーヘッドは 56.9K および 66.3K ですが、最先端のトークンでは 90.2\% しか得られません。
78.9\% pass@1 で、全体のトークンオーバーヘッドは 138.2K と 206.5K です。

要約(オリジナル)

The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent. During the coding procedure, the programmer agent will focus on the code generation and refinement based on the test executor agent’s feedback. The test designer agent will generate test cases for the generated code, and the test executor agent will run the code with the test cases and write the feedback to the programmer. This collaborative system ensures robust code generation, surpassing the limitations of single-agent models and traditional methodologies. Our extensive experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder’s superior performance over existing code generation models and prompt engineering techniques across various benchmarks. For example, AgentCoder (GPT-4) achieves 96.3\% and 91.8\% pass@1 in HumanEval and MBPP datasets with an overall token overhead of 56.9K and 66.3K, while state-of-the-art obtains only 90.2\% and 78.9\% pass@1 with an overall token overhead of 138.2K and 206.5K.

arxiv情報

著者	Dong Huang,Jie M. Zhang,Michael Luck,Qingwen Bu,Yuhao Qing,Heming Cui
発行日	2024-05-24 11:47:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー