OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

要約

大規模な言語モデルの導入により、コード生成が大幅に進歩しました。
ただし、オープンソースモデルには、GPT-4 コードインタープリターのような高度なシステムの実行機能や反復的な改良が欠けていることがよくあります。
これに対処するために、コードを生成、実行、反復的に改良するために設計されたオープンソースコードシステムファミリである OpenCodeInterpreter を導入します。
68K マルチターンインタラクションを特徴とするデータセットであるコードフィードバックによってサポートされている OpenCodeInterpreter は、実行と人間によるフィードバックを統合して、コードを動的に改良します。
HumanEval、MBPP、および EvalPlus の拡張バージョンなどの主要なベンチマークにわたる OpenCodeInterpreter の包括的な評価により、その並外れたパフォーマンスが明らかになりました。
特に、OpenCodeInterpreter-33B は、HumanEval と MBPP の平均 (およびプラスバージョン) で 83.2 (76.4) の精度を達成し、GPT-4 の 84.2 (76.2) に匹敵し、さらに GPT からの人間によるフィードバックを合成すると 91.6 (84.6) まで向上します。
4.
OpenCodeInterpreter は、オープンソースコード生成モデルと GPT-4 コードインタープリターのような独自システムとの間にギャップをもたらします。

要約(オリジナル)

The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4’s 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter.

arxiv情報

著者	Tianyu Zheng,Ge Zhang,Tianhao Shen,Xueling Liu,Bill Yuchen Lin,Jie Fu,Wenhu Chen,Xiang Yue
発行日	2024-02-22 16:06:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー