An Empirical Study on LLM-based Agents for Automated Bug Fixing

要約

大規模言語モデル (LLM) と LLM ベースのエージェントはバグを自動的に修正するために適用されており、開発環境の対話、反復的な検証、およびコードの変更を行うことでソフトウェアの欠陥に対処できる能力を実証しています。
ただし、これらのエージェントシステムと非エージェントシステムの体系的な分析は、特に最高パフォーマンスのシステム間のパフォーマンスのばらつきに関しては依然として限られています。
このペーパーでは、自動バグ修正のための SWE-bench Lite ベンチマークで 7 つの独自のオープンソースシステムを検証します。
まず、各システムの全体的なパフォーマンスを評価し、これらのシステムのすべてで解決できるインスタンス、またはいずれのシステムでも解決できないインスタンスに注目し、一部のインスタンスが特定のシステムタイプによって独自に解決される理由を調査します。
また、ファイルレベルと行レベルで障害位置特定の精度を比較し、バグ再現能力を評価して、動的再現によってのみ解決可能なインスタンスを特定します。
分析を通じて、バグ修正におけるエージェントの有効性を向上させるには、LLM 自体とエージェントフローの設計の両方でさらなる最適化が必要であると結論付けました。

要約(オリジナル)

Large language models (LLMs) and LLM-based Agents have been applied to fix bugs automatically, demonstrating the capability in addressing software defects by engaging in development environment interaction, iterative validation and code modification. However, systematic analysis of these agent and non-agent systems remain limited, particularly regarding performance variations among top-performing ones. In this paper, we examine seven proprietary and open-source systems on the SWE-bench Lite benchmark for automated bug fixing. We first assess each system’s overall performance, noting instances solvable by all or none of these sytems, and explore why some instances are uniquely solved by specific system types. We also compare fault localization accuracy at file and line levels and evaluate bug reproduction capabilities, identifying instances solvable only through dynamic reproduction. Through analysis, we concluded that further optimization is needed in both the LLM itself and the design of Agentic flow to improve the effectiveness of the Agent in bug fixing.

arxiv情報

著者	Xiangxin Meng,Zexiong Ma,Pengfei Gao,Chao Peng
発行日	2024-11-15 14:19:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An Empirical Study on LLM-based Agents for Automated Bug Fixing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー