RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation

要約

リポジトリレベルのコード変換とは、ソースリポジトリの機能を維持しながら、コードリポジトリ全体をあるプログラミング言語から別のプログラミング言語に変換することを指します。
このようなコードトランスレータのパフォーマンスを評価するために、多くのベンチマークが提案されています。
ただし、以前のベンチマークは主に、コードスニペット、関数、またはファイルレベルのコード変換に焦点を当てた、きめ細かいサンプルを提供しています。
このようなベンチマークは、より長いコード長とより複雑な機能を伴うリポジトリ全体の翻訳が必要になることが多い現実世界の需要を正確に反映していません。
このギャップに対処するために、私たちは RepoTransBench という名前の新しいベンチマークを提案します。これは、自動的に実行可能なテストスイートを備えた現実世界のリポジトリレベルのコード変換ベンチマークです。
RepoTransBench で実験を行い、11 個の高度な LLM の翻訳パフォーマンスを評価します。
最もパフォーマンスの高い LLM の Success@1 スコア (1 回の試行でのテスト成功) は、わずか 7.33% であることがわかります。
リポジトリレベルのコード変換における LLM の可能性をさらに調査するために、LLM にエラー関連のフィードバックを提供して反復デバッグを実行し、Success@1 で平均 7.09% の改善を観察しました。
ただし、この改善があっても、最もパフォーマンスの高い LLM の Success@1 スコアはわずか 21% であり、信頼性の高いリポジトリレベルの自動コード変換のニーズを満たしていない可能性があります。
最後に、詳細なエラー分析を実施し、リポジトリレベルのコード変換における現在の LLM の欠陥を明らかにします。これは、さらなる改善の参考となる可能性があります。

要約(オリジナル)

Repository-level code translation refers to translating an entire code repository from one programming language to another while preserving the functionality of the source repository. Many benchmarks have been proposed to evaluate the performance of such code translators. However, previous benchmarks mostly provide fine-grained samples, focusing at either code snippet, function, or file-level code translation. Such benchmarks do not accurately reflect real-world demands, where entire repositories often need to be translated, involving longer code length and more complex functionalities. To address this gap, we propose a new benchmark, named RepoTransBench, which is a real-world repository-level code translation benchmark with an automatically executable test suite. We conduct experiments on RepoTransBench to evaluate the translation performance of 11 advanced LLMs. We find that the Success@1 score (test success in one attempt) of the best-performing LLM is only 7.33%. To further explore the potential of LLMs for repository-level code translation, we provide LLMs with error-related feedback to perform iterative debugging and observe an average 7.09% improvement on Success@1. However, even with this improvement, the Success@1 score of the best-performing LLM is only 21%, which may not meet the need for reliable automatic repository-level code translation. Finally, we conduct a detailed error analysis and highlight current LLMs’ deficiencies in repository-level code translation, which could provide a reference for further improvements.

arxiv情報

著者	Yanli Wang,Yanlin Wang,Suiquan Wang,Daya Guo,Jiachi Chen,John Grundy,Xilin Liu,Yuchi Ma,Mingzhi Mao,Hongyu Zhang,Zibin Zheng
発行日	2024-12-23 17:52:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー