Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries

要約

セキュリティの専門家がバイナリコードをリバースエンジニアリング (逆コンパイル) して、重大なセキュリティの脆弱性を特定します。
重要なシステムのソースコード (ファームウェア、ドライバー、重要インフラ (CI) で使用される独自のソフトウェアなど) へのアクセスが制限されているため、この分析はバイナリレベルでさらに重要になります。
利用可能なソースコードであっても、ソースとプロセッサによって実行されるバイナリコードとの間には、コンパイル後にセマンティックギャップが残ります。
このギャップにより、ソースコードの脆弱性の検出が妨げられる可能性があります。
そうは言っても、大規模言語モデル (LLM) に関する現在の研究では、ソースコードのみに焦点を当てているため、この分野における逆コンパイルされたバイナリの重要性が見落とされています。
この研究では、主に関連するデータセットが存在しないことが原因で、逆コンパイルされたバイナリの脆弱性の分析に関して、最先端の LLM の実質的な意味論的制限を経験的に明らかにした最初の者です。
このギャップを埋めるために、新しい逆コンパイルされたバイナリコード脆弱性データセットである DeBinVul を紹介します。
私たちのデータセットはマルチアーキテクチャおよびマルチ最適化であり、CI で広く使用され、多数の脆弱性と関連しているため、C/C++ に重点を置いています。
具体的には、(i) 特定するというタスクのために、脆弱性および脆弱性のない逆コンパイルされたバイナリコードのサンプル 150,872 個を厳選しています。
(ii) 分類する。
(iii) 脆弱性の説明。
(iv) 逆コンパイルされたバイナリのドメイン内の関数名を回復します。
その後、DeBinVul を使用して最先端の LLM を微調整し、バイナリコードの脆弱性を検出する際の CodeLlama、Llama3、CodeGen2 の機能でそれぞれ 19%、24%、21% のパフォーマンス向上が報告されました。
さらに、DeBinVul を使用すると、脆弱性分類タスクで 80 ～ 90% の高いパフォーマンスが得られたと報告しています。
さらに、関数名回復タスクと脆弱性説明タスクのパフォーマンスが向上したことを報告します。

要約(オリジナル)

Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems – such as firmware, drivers, and proprietary software used in Critical Infrastructures (CI) – makes this analysis even more crucial on the binary level. Even with available source code, a semantic gap persists after compilation between the source and the binary code executed by the processor. This gap may hinder the detection of vulnerabilities in source code. That being said, current research on Large Language Models (LLMs) overlooks the significance of decompiled binaries in this area by focusing solely on source code. In this work, we are the first to empirically uncover the substantial semantic limitations of state-of-the-art LLMs when it comes to analyzing vulnerabilities in decompiled binaries, largely due to the absence of relevant datasets. To bridge the gap, we introduce DeBinVul, a novel decompiled binary code vulnerability dataset. Our dataset is multi-architecture and multi-optimization, focusing on C/C++ due to their wide usage in CI and association with numerous vulnerabilities. Specifically, we curate 150,872 samples of vulnerable and non-vulnerable decompiled binary code for the task of (i) identifying; (ii) classifying; (iii) describing vulnerabilities; and (iv) recovering function names in the domain of decompiled binaries. Subsequently, we fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in the capabilities of CodeLlama, Llama3, and CodeGen2 respectively, in detecting binary code vulnerabilities. Additionally, using DeBinVul, we report a high performance of 80-90% on the vulnerability classification task. Furthermore, we report improved performance in function name recovery and vulnerability description tasks.

arxiv情報

著者	Dylan Manuel,Nafis Tanveer Islam,Joseph Khoury,Ana Nunez,Elias Bou-Harb,Peyman Najafirad
発行日	2024-11-07 18:54:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー