Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

要約

さまざまなコード合成タスクのための大規模な言語モデルの開発と評価に、かなりの量の研究が焦点を当てています。
これには、自然言語命令からのコードの合成、コードからのテストの合成、コードの説明の合成が含まれます。
対照的に、LLM を使用した命令コード編集の動作については十分に研究されていません。
これらは、プロンプトで提供されるコードブロックを更新するようにモデルに指示されるタスクです。
編集指示では、機能の追加または削除を要求したり、バグを説明して修正を要求したり、別の種類の解決策を要求したり、その他多くの一般的なコード編集タスクを要求したりする場合があります。
コード編集タスクの慎重に作成されたベンチマークを導入し、それを使用していくつかの最先端の LLM を評価します。
私たちの評価では、最先端のオープンモデルとクローズドモデルの機能の間に大きなギャップがあることが明らかになりました。
たとえば、GPT-3.5-Turbo でさえ、コード編集においては最高のオープンモデルより 8.8% 優れています。
また、自然言語命令と組み合わせたコード編集の、慎重に厳選され、寛容にライセンスされた新しいトレーニングセットも導入します。
このトレーニングセットを使用すると、オープンコード LLM を微調整してコード編集機能を大幅に向上できることを示します。

要約(オリジナル)

A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language instructions, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is instructed to update a block of code provided in a prompt. The editing instruction may ask for a feature to added or removed, describe a bug and ask for a fix, ask for a different kind of solution, or many other common code editing tasks. We introduce a carefully crafted benchmark of code editing tasks and use it evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is 8.8% better than the best open model at editing code. We also introduce a new, carefully curated, permissively licensed training set of code edits coupled with natural language instructions. Using this training set, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities.

arxiv情報

著者	Federico Cassano,Luisa Li,Akul Sethi,Noah Shinn,Abby Brennan-Jones,Anton Lozhkov,Carolyn Jane Anderson,Arjun Guha
発行日	2023-12-21 13:43:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー