You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects

要約

プロジェクトのテストスイートを実行する機能は、コードの品質とコードカバレッジの評価、開発者や自動化ツールによるコードの変更の検証、依存関係との互換性の確認など、多くのシナリオで不可欠です。
その重要性にもかかわらず、プロジェクトのテストスイートの実行は、プロジェクトごとに異なるプログラミング言語、ソフトウェアエコシステム、ビルドシステム、テストフレームワーク、その他のツールを使用するため、実際には困難な場合があります。
これらの課題により、さまざまなプロジェクト間で機能する、信頼性の高い汎用的なテスト実行方法を作成することが困難になります。
このペーパーでは、任意のプロジェクトをインストールし、テストケースを実行するようにそれらを構成し、セットアップを再現するためのプロジェクト固有のスクリプトを生成する自動化技術である ExecutionAgent について説明します。
人間の開発者がこのタスクに取り組む方法にヒントを得た、私たちのアプローチは、自律的にコマンドを実行し、ホストシステムと対話する大規模な言語モデルベースのエージェントです。
エージェントはメタプロンプトを使用して、特定のプロジェクトに関連する最新テクノロジーに関するガイドラインを収集し、前のステップからのフィードバックに基づいてプロセスを繰り返し改善します。
私たちの評価では、14 の異なるプログラミング言語と多くの異なるビルドおよびテストツールを使用する 50 のオープンソースプロジェクトに ExecutionAgent を適用しました。
このアプローチは、33/55 プロジェクトのテストスイートを正常に実行し、グラウンドトゥルーステストスイート実行のテスト結果とわずか 7.5\% の偏差で照合します。
これらの結果は、以前に利用可能な最高の技術よりも 6.6 倍向上しました。
このアプローチによって課されるコストは妥当であり、プロジェクトあたりの平均実行時間は 74 分、LLM コストは 0.16 ドルです。
私たちは、ExecutionAgent が、さまざまなプロジェクトにわたってテストを実行する必要がある開発者、自動プログラミングツール、研究者にとって価値のあるツールとして機能することを想定しています。

要約(オリジナル)

The ability to execute the test suite of a project is essential in many scenarios, e.g., to assess code quality and code coverage, to validate code changes made by developers or automated tools, and to ensure compatibility with dependencies. Despite its importance, executing the test suite of a project can be challenging in practice because different projects use different programming languages, software ecosystems, build systems, testing frameworks, and other tools. These challenges make it difficult to create a reliable, universal test execution method that works across different projects. This paper presents ExecutionAgent, an automated technique that installs arbitrary projects, configures them to run test cases, and produces project-specific scripts to reproduce the setup. Inspired by the way a human developer would address this task, our approach is a large language model-based agent that autonomously executes commands and interacts with the host system. The agent uses meta-prompting to gather guidelines on the latest technologies related to the given project, and it iteratively refines its process based on feedback from the previous steps. Our evaluation applies ExecutionAgent to 50 open-source projects that use 14 different programming languages and many different build and testing tools. The approach successfully executes the test suites of 33/55 projects, while matching the test results of ground truth test suite executions with a deviation of only 7.5\%. These results improve over the best previously available technique by 6.6x. The costs imposed by the approach are reasonable, with an execution time of 74 minutes and LLM costs of 0.16 dollars, on average per project. We envision ExecutionAgent to serve as a valuable tool for developers, automated programming tools, and researchers that need to execute tests across a wide variety of projects.

arxiv情報

著者	Islem Bouzenia,Michael Pradel
発行日	2024-12-13 13:30:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー