Execution-based Code Generation using Deep Reinforcement Learning

要約

ソフトウェアエンジニアリングプロセスを自動化する手段として、大規模なコードコーパスで事前トレーニングされたプログラミング言語 (PL) モデルを利用することは、コード補完、コード変換、プログラム合成などのさまざまなコード生成タスクを合理化する上で大きな可能性を示しています。
しかし、現在のアプローチは主に、テキスト生成から借用した教師あり微調整目標に依存しており、コンパイル可能性や構文的および機能的な正確さを含むがこれらに限定されない、コードのシーケンスレベルの固有の特性を無視しています。
この制限に対処するために、私たちは PPOCoder を提案します。これは、事前トレーニングされた PL モデルと、広く使用されている深層強化学習手法である Proximal Policy Optimization (PPO) を相乗的に組み合わせる、コード生成のための新しいフレームワークです。
PPOCoder は、コードの実行と構造の調整からの微分不可能なフィードバックを利用することで、外部のコード固有の知識をモデル最適化プロセスにシームレスに統合します。
PPOCoder は、さまざまなコード生成タスクや PL にわたって使用できる、タスクやモデルに依存しないフレームワークであることに注意することが重要です。
3 つのコード生成タスクに関する広範な実験により、SOTA 手法と比較して提案されたアプローチの有効性が実証され、さまざまな PL にわたってコンパイルの成功率と機能の正確性が大幅に向上しました。

要約(オリジナル)

The utilization of programming language (PL) models, pre-trained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such as code completion, code translation, and program synthesis. However, current approaches mainly rely on supervised fine-tuning objectives borrowed from text generation, neglecting unique sequence-level characteristics of code, including but not limited to compilability as well as syntactic and functional correctness. To address this limitation, we propose PPOCoder, a new framework for code generation that synergistically combines pre-trained PL models with Proximal Policy Optimization (PPO) which is a widely used deep reinforcement learning technique. By utilizing non-differentiable feedback from code execution and structure alignment, PPOCoder seamlessly integrates external code-specific knowledge into the model optimization process. It’s important to note that PPOCoder is a task-agnostic and model-agnostic framework that can be used across different code generation tasks and PLs. Extensive experiments on three code generation tasks demonstrate the effectiveness of our proposed approach compared to SOTA methods, achieving significant improvements in compilation success rates and functional correctness across different PLs.

arxiv情報

著者	Parshin Shojaee,Aneesh Jain,Sindhu Tipirneni,Chandan K. Reddy
発行日	2023-07-18 16:49:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Execution-based Code Generation using Deep Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー