Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast

要約

静的検証はソフトウェアの品質を向上させる強力な方法ですが、多大な人的労力とリソースを必要とします。
これは、所有権ロジックを使用してプログラムを操作するヒープについて推論する静的ベリファイアに特に当てはまります。
LLM は、コード生成、テスト生成、定理証明者の証明生成、静的検証者の仕様生成など、多くのソフトウェアエンジニアリング活動で有望であることが示されています。
しかし、これまでの研究では、LLM が分離ロジックなどの所有権ロジックに基づいて仕様の仕様生成をどの程度うまく実行できるかについては調査されていませんでした。
このギャップに対処するために、この論文では、VeriFast で人間が作成したプログラムの静的検証のための分離ロジックに基づいて完全に正しい仕様を生成する際の大規模言語モデル (LLM)、特に OpenAI の GPT モデルの有効性を調査します。
最初の実験では従来のプロンプトエンジニアリングを使用し、2 番目の実験では思考連鎖 (CoT) プロンプトを使用して、GPT モデル全体で生成される一般的なエラーを特定して対処しました。
結果は、GPT モデルが VeriFast でヒープ操作コードを検証するための仕様を正常に生成できることを示しています。
さらに、CoT プロンプトは GPT モデルによって生成される構文エラーを大幅に削減しますが、プロンプトエンジニアリングと比較して検証エラー率を大きく改善するわけではありません。

要約(オリジナル)

Static verification is a powerful method for enhancing software quality, but it demands significant human labor and resources. This is particularly true of static verifiers that reason about heap manipulating programs using an ownership logic. LLMs have shown promise in a number of software engineering activities, including code generation, test generation, proof generation for theorem provers, and specification generation for static verifiers. However, prior work has not explored how well LLMs can perform specification generation for specifications based in an ownership logic, such as separation logic. To address this gap, this paper explores the effectiveness of large language models (LLMs), specifically OpenAI’s GPT models, in generating fully correct specifications based on separation logic for static verification of human-written programs in VeriFast. Our first experiment employed traditional prompt engineering and the second used Chain-of-Thought (CoT) Prompting to identify and address common errors generated across the GPT models. The results indicate that GPT models can successfully generate specifications for verifying heap manipulating code with VeriFast. Furthermore, while CoT prompting significantly reduces syntax errors generated by the GPT models, it does not greatly improve verification error rates compared to prompt engineering.

arxiv情報

著者	Marilyn Rego,Wen Fan,Xin Hu,Sanya Dod,Zhaorui Ni,Danning Xie,Jenna DiVincenzo,Lin Tan
発行日	2024-11-04 17:44:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー