Self-Edit: Fault-Aware Code Editor for Code Generation

要約

大規模言語モデル (LLM) は、競技プログラミングタスクでコードを生成する優れた能力を実証しています。
ただし、サンプル数が限られているため、LLM は依然として精度が低いという問題があります。
人間のプログラミングのプロセスにヒントを得て、LLM から生成されたコードの実行結果を利用して競技プログラミングタスクのコード品質を向上させる、Self-Edit と呼ばれる生成と編集のアプローチを提案します。
質問で提供されたサンプルテストケースで生成されたコードを実行し、実行結果を補足コメントにラップします。
このコメントをガイダンスとして利用し、障害認識コードエディターを使用して、生成されたコード内のエラーを修正します。
私たちは、9 つの異なる LLM を使用して 2 つの競技プログラミングデータセットにわたって広範な評価を実行します。
LLM から直接生成する場合と比較して、このアプローチでは、パラメーターサイズが異なる 9 つの一般的なコード生成 LLM と比較して、pass@1 の平均を APPS-dev で 89\%、APPS-test で 31\%、HumanEval で 48\% 向上させることができます。
110Mから175Bまで。
他の後処理方法と比較して、私たちの方法は優れた精度と効率を示します。

要約(オリジナル)

Large language models (LLMs) have demonstrated an impressive ability to generate codes on competitive programming tasks. However, with limited sample numbers, LLMs still suffer from poor accuracy. Inspired by the process of human programming, we propose a generate-and-edit approach named Self-Edit that utilizes execution results of the generated code from LLMs to improve the code quality on the competitive programming task. We execute the generated code on the example test case provided in the question and wrap execution results into a supplementary comment. Utilizing this comment as guidance, our fault-aware code editor is employed to correct errors in the generated code. We perform extensive evaluations across two competitive programming datasets with nine different LLMs. Compared to directly generating from LLMs, our approach can improve the average of pass@1 by 89\% on APPS-dev, 31\% on APPS-test, and 48\% on HumanEval over nine popular code generation LLMs with parameter sizes ranging from 110M to 175B. Compared to other post-processing methods, our method demonstrates superior accuracy and efficiency.

arxiv情報

著者	Kechi Zhang,Zhuo Li,Jia Li,Ge Li,Zhi Jin
発行日	2023-09-11 06:27:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Edit: Fault-Aware Code Editor for Code Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー