IterPref: Focal Preference Learning for Code Generation via Iterative Debugging

要約

優先学習は、相対的な品質比較を活用することにより、監視された微調整を超えてコードLLMを強化します。
既存の方法は、テストケースの成功に基づいて候補者から優先ペアを構築し、より高い合格レートサンプルを正、低いものを負で扱います。
ただし、このアプローチでは、コードの特定のエラーを特定するものではなく、モデルがより有益なエラー修正パターンを学習することを妨げます。これは、障害コード全体が意味のあるエラー解像度関係をキャプチャするために必要な粒度がないためです。
これらの問題に対処するために、コードLLMを絞り込むための人間の反復デバッグを模倣する新しい優先アライメントフレームワークであるIterPrefを提案します。
IterPrefは、エラー領域を明示的に見つけ、調整されたDPOアルゴリズムを介して対応するトークンを整列させます。
有益なペアを生成するために、コードフローデータセットを導入します。このデータセットでは、サンプルがテストに合格するまで繰り返し洗練され、エラー修正をキャプチャします。
大規模な実験では、IterPrefを装備した多様なコードLLMSがコード生成の大幅なパフォーマンスの向上を達成し、BigCodebenchなどの挑戦的なタスクを改善することが示されています。
詳細な分析により、IterPrefのエラーが少ないことが明らかになりました。
私たちのコードとデータはPublicalyを利用可能にします。

要約(オリジナル)

Preference learning enhances Code LLMs beyond supervised fine-tuning by leveraging relative quality comparisons. Existing methods construct preference pairs from candidates based on test case success, treating the higher pass rate sample as positive and the lower as negative. However, this approach does not pinpoint specific errors in the code, which prevents the model from learning more informative error correction patterns, as aligning failing code as a whole lacks the granularity needed to capture meaningful error-resolution relationships. To address these issues, we propose IterPref, a new preference alignment framework that mimics human iterative debugging to refine Code LLMs. IterPref explicitly locates error regions and aligns the corresponding tokens via a tailored DPO algorithm. To generate informative pairs, we introduce the CodeFlow dataset, where samples are iteratively refined until passing tests, with modifications capturing error corrections. Extensive experiments show that a diverse suite of Code LLMs equipped with IterPref achieves significant performance gains in code generation and improves on challenging tasks like BigCodeBench. In-depth analysis reveals that IterPref yields fewer errors. Our code and data will be made publicaly available.

arxiv情報

著者	Jie Wu,Haoling Li,Xin Zhang,Jianwen Luo,Yangyu Huang,Ruihang Chu,Yujiu Yang,Scarlett Li
発行日	2025-03-04 16:56:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

IterPref: Focal Preference Learning for Code Generation via Iterative Debugging

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー