SWE-smith: Scaling Data for Software Engineering Agents

要約

ソフトウェアエンジニアリングの言語モデル（LMS）の最近の進歩にもかかわらず、トレーニングデータを収集することは依然として重要な問題です。
既存のデータセットは小さく、11以下のGitHubリポジトリから最大1,000のトレーニングインスタンスがあります。
このようなデータセットをキュレートする手順はしばしば複雑であり、数百時間の人間の労働を必要とします。
コンパニオンの実行環境も、スケーラビリティと使いやすさを厳しく制限し、数テラバイトのストレージを取り上げます。
この問題点に対処するために、SWE-SMITHを紹介します。SWE-Smithは、大規模にソフトウェアエンジニアリングトレーニングデータを生成するための新しいパイプラインを紹介します。
Pythonコードベースを考慮して、SWE-SMITHは対応する実行環境を構築し、コードベースの既存のテストを破る100〜1,000のタスクインスタンスを自動的に合成します。
SWE-SMITHを使用して、128のGitHubリポジトリから供給された50Kインスタンスのデータセットを作成します。
SWE-Agent-LM-32Bをトレーニングし、SWEベンチ検証ベンチマークで40.2％パス@1解決レートを達成しました。これは、オープンソースモデルの最先端です。
自動ソフトウェアエンジニアリングのためのLMシステムでの研究の障壁を下げるために、SWE-SWESMITH（収集手順、タスクインスタンス、軌跡、モデル）をオープンします。
https://swesmith.comで利用可能なすべての資産。

要約(オリジナル)

Despite recent progress in Language Models (LMs) for software engineering, collecting training data remains a significant pain point. Existing datasets are small, with at most 1,000s of training instances from 11 or fewer GitHub repositories. The procedures to curate such datasets are often complex, necessitating hundreds of hours of human labor; companion execution environments also take up several terabytes of storage, severely limiting their scalability and usability. To address this pain point, we introduce SWE-smith, a novel pipeline for generating software engineering training data at scale. Given any Python codebase, SWE-smith constructs a corresponding execution environment, then automatically synthesizes 100s to 1,000s of task instances that break existing test(s) in the codebase. Using SWE-smith, we create a dataset of 50k instances sourced from 128 GitHub repositories, an order of magnitude larger than all previous works. We train SWE-agent-LM-32B, achieving 40.2% Pass@1 resolve rate on the SWE-bench Verified benchmark, state of the art among open source models. We open source SWE-smith (collection procedure, task instances, trajectories, models) to lower the barrier of entry for research in LM systems for automated software engineering. All assets available at https://swesmith.com.

arxiv情報

著者	John Yang,Kilian Leret,Carlos E. Jimenez,Alexander Wettig,Kabir Khandpur,Yanzhe Zhang,Binyuan Hui,Ofir Press,Ludwig Schmidt,Diyi Yang
発行日	2025-04-30 16:56:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SWE-smith: Scaling Data for Software Engineering Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー