Is Self-Repair a Silver Bullet for Code Generation?

要約

大規模な言語モデルは、コード生成において顕著な適性を示していますが、依然として困難なタスクには苦労しています。
自己修復 (モデルがデバッグしてコード自体の間違いを修正する) は、最近、これらの設定でパフォーマンスを向上させる一般的な方法となっています。
ただし、自己修復がいつどのように効果的に行われるかについての文献には非常に限られた研究しか存在せず、そのモデルが元々同じモデルによって生成されたコードの間違いを実際にどの程度修復できるのか疑問に思う人もいるかもしれません。
この論文では、HumanEval または APPS から得られた問題に対して自己修復を実行する Code Llama、GPT-3.5、および GPT-4 の機能を分析し、修復を実行するコストを考慮すると、利益は多くの場合控えめで、ばらつきがあることがわかりました。
データのサブセット間には大幅に存在し、まったく存在しないこともあります。
これは、モデルが自身のコードにフィードバックを提供する機能によって自己修復がボトルネックになっているためであると仮説を立てています。
より強力なモデルでフィードバックを強化すると、モデルが自己修復の恩恵を受けられない設定でもパフォーマンスの向上が観察されます。
最後に、人間の参加者からのフィードバックをモデルに提供することは、GPT-4 であっても修復に大きな利益をもたらすことを発見し、観察された違いの簡単な定性分析を実行します。

要約(オリジナル)

Large language models have shown remarkable aptitude in code generation, but still struggle on challenging tasks. Self-repair — in which the model debugs and fixes mistakes in its own code — has recently become a popular way to boost performance in these settings. However, only very limited studies on how and when self-repair works effectively exist in the literature, and one might wonder to what extent a model is really capable of repairing mistakes in code which was originally generated by that very same model. In this paper, we analyze Code Llama, GPT-3.5 and GPT-4’s ability to perform self-repair on problems taken from HumanEval or APPS, finding that when the cost of carrying out repair is taken into account, gains are often modest, vary significantly between subsets of the data, and are sometimes not present at all. We hypothesize that this is because self-repair is bottlenecked by the model’s ability to provide feedback on its own code; boosting the feedback with stronger models, we observe performance gains even in settings where the model does not benefit from self-repair. Finally, we find that providing the model with feedback from human participants greatly benefits repair even for GPT-4, and carry out a brief qualitative analysis of the differences observed.

arxiv情報

著者	Theo X. Olausson,Jeevana Priya Inala,Chenglong Wang,Jianfeng Gao,Armando Solar-Lezama
発行日	2023-10-17 17:51:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Is Self-Repair a Silver Bullet for Code Generation?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー