Software Vulnerability and Functionality Assessment using LLMs

要約

コードレビューはソフトウェア開発プロセスの中心ですが、実行には時間がかかり、費用がかかる場合があります。
このペーパーでは、大規模言語モデル (LLM) がコードレビューに役立つかどうか、またどのように役立つかを調査します。
私たちの調査は、良いレビューの基本であると主張する 2 つのタスクに焦点を当てています。(i) コードにセキュリティの脆弱性を報告すること、および (ii) ソフトウェアの機能検証を実行すること、つまり、コードが意図した機能を満たしていることを確認することです。
両方のタスクのパフォーマンスをテストするために、ゼロショットと思考連鎖プロンプトを使用して、最終的な「承認または拒否」の推奨事項を取得します。
データとして、Common Weakness Enumeration (CWE) のセキュリティ脆弱性を含む専門家が作成したコードスニペットとともに、独創的なコード生成データセット (HumanEval および MBPP) を採用しています。
私たちの実験では、OpenAI の 3 つの独自モデルと小規模なオープンソース LLM の混合を考慮しています。
前者のパフォーマンスが後者のパフォーマンスを大幅に上回っていることがわかります。
有望な結果に動機付けられて、私たちは最終的にモデルにセキュリティの脆弱性の詳細な説明を提供するよう求めます。
結果は、LLM によって生成された記述の 36.7% が真の CWE 脆弱性に関連付けられる可能性があることを示しています。

要約(オリジナル)

While code review is central to the software development process, it can be tedious and expensive to carry out. In this paper, we investigate whether and how Large Language Models (LLMs) can aid with code reviews. Our investigation focuses on two tasks that we argue are fundamental to good reviews: (i) flagging code with security vulnerabilities and (ii) performing software functionality validation, i.e., ensuring that code meets its intended functionality. To test performance on both tasks, we use zero-shot and chain-of-thought prompting to obtain final “approve or reject” recommendations. As data, we employ seminal code generation datasets (HumanEval and MBPP) along with expert-written code snippets with security vulnerabilities from the Common Weakness Enumeration (CWE). Our experiments consider a mixture of three proprietary models from OpenAI and smaller open-source LLMs. We find that the former outperforms the latter by a large margin. Motivated by promising results, we finally ask our models to provide detailed descriptions of security vulnerabilities. Results show that 36.7% of LLM-generated descriptions can be associated with true CWE vulnerabilities.

arxiv情報

著者	Rasmus Ingemann Tuffveson Jensen,Vali Tawosi,Salwa Alamir
発行日	2024-03-13 11:29:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Software Vulnerability and Functionality Assessment using LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー