CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

要約

Cからust骨の輸送は、現代の錆生態系との安全性と相互運用性を高めながら、レガシーCコードを近代化するために不可欠です。
ただし、システムがCを安全な錆に透過させることができるかどうかを評価するためのデータセットは現在存在していません。
100 Cのリポジトリのデータセットであるクラストベンチを紹介します。それぞれが、安全な錆の手動で書かれたインターフェイスと、輸送の正確性を検証するために使用できるテストケースと組み合わせます。
孤立した機能ではなくリポジトリ全体を考慮することにより、Crust-Benchは複数のファイルに依存する複雑なプロジェクトを翻訳するという課題を捉えています。
提供された錆びインターフェイスは、慣用的なメモリ安全性の錆パターンを遵守する一方、添付のテストケースが機能的正しさを強制する明示的な仕様を提供します。
このタスクで最先端の大規模な言語モデル（LLM）を評価し、安全で慣用的な錆の生成は、さまざまな最先端の方法と技術にとって依然として困難な問題であることがわかります。
また、LLMが通常、CからSafe Rustにコードを延ばしに行うエラーに関する洞察を提供します。
最高のパフォーマンスモデルであるOpenai O1は、シングルショット設定で15のタスクのみを解くことができます。
クラストベンチの改善は、複雑なシナリオについて推論することができるトランスピレーションシステムの改善につながり、メモリの安全を確保する錆のような言語にレガシーコードベースを移行するのに役立ちます。
https://github.com/anirudhkhatry/crust-benchでデータセットとコードを見つけることができます。

要約(オリジナル)

C-to-Rust transpilation is essential for modernizing legacy C code while enhancing safety and interoperability with modern Rust ecosystems. However, no dataset currently exists for evaluating whether a system can transpile C into safe Rust that passes a set of test cases. We introduce CRUST-Bench, a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases that can be used to validate correctness of the transpilation. By considering entire repositories rather than isolated functions, CRUST-Bench captures the challenges of translating complex projects with dependencies across multiple files. The provided Rust interfaces provide explicit specifications that ensure adherence to idiomatic, memory-safe Rust patterns, while the accompanying test cases enforce functional correctness. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem for various state-of-the-art methods and techniques. We also provide insights into the errors LLMs usually make in transpiling code from C to safe Rust. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting. Improvements on CRUST-Bench would lead to improved transpilation systems that can reason about complex scenarios and help in migrating legacy codebases from C into languages like Rust that ensure memory safety. You can find the dataset and code at https://github.com/anirudhkhatry/CRUST-bench.

arxiv情報

著者	Anirudh Khatry,Robert Zhang,Jia Pan,Ziteng Wang,Qiaochu Chen,Greg Durrett,Isil Dillig
発行日	2025-04-21 17:33:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー