Success is in the Details: Evaluate and Enhance Details Sensitivity of Code LLMs through Counterfactuals

要約

コードの感度とは、問題の説明の詳細の変更を認識して応答するコードLLMの能力を指します。
現在のコードベンチマークと命令データは難易度と多様性に焦点を当てていますが、感度は見落とされています。
最初に、反事実的な摂動を使用して構築されたCTFコードベンチマークを導入し、出力の変更を最大化しながら入力の変更を最小限に抑えます。
この評価は、多くのLLMが元の問題と比較して10 \％を超えるパフォーマンス低下を持っていることを示しています。
感度を完全に活用するために、微調整微調整フレームワークであるCTF-Instructは、既存のデータを拡張し、選択メカニズムを使用して、難易度、多様性、感度の3つの側面を満たします。
CTF-Instructデータで微調整されたLLMSは、CTFコードで2 \％の改善とLiveCodebenchで10 \％パフォーマンスの向上を達成し、LLMSの感度を向上させてパフォーマンスを向上させる実現可能性を検証することを実験しています。

要約(オリジナル)

Code Sensitivity refers to the ability of Code LLMs to recognize and respond to details changes in problem descriptions. While current code benchmarks and instruction data focus on difficulty and diversity, sensitivity is overlooked. We first introduce the CTF-Code benchmark, constructed using counterfactual perturbations, minimizing input changes while maximizing output changes. The evaluation shows that many LLMs have a more than 10\% performance drop compared to the original problems. To fully utilize sensitivity, CTF-Instruct, an incremental instruction fine-tuning framework, extends on existing data and uses a selection mechanism to meet the three dimensions of difficulty, diversity, and sensitivity. Experiments show that LLMs fine-tuned with CTF-Instruct data achieve over a 2\% improvement on CTF-Code, and more than a 10\% performance boost on LiveCodeBench, validating the feasibility of enhancing LLMs’ sensitivity to improve performance.

arxiv情報

著者	Xianzhen Luo,Qingfu Zhu,Zhiming Zhang,Mingzheng Xu,Tianhao Cheng,Yixuan Wang,Zheng Chu,Shijie Xuyang,Zhiyuan Ma,YuanTao Fan,Wanxiang Che
発行日	2025-05-20 16:48:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Success is in the Details: Evaluate and Enhance Details Sensitivity of Code LLMs through Counterfactuals

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー