EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code

要約

ソフトウェアの脆弱性の自動検出はセキュリティを強化するために重要ですが、既存の方法では最新のコードベースの複雑さと多様性に苦戦することがよくあります。
このペーパーでは、自然言語処理 (NLP) 技術を使用して脆弱性検出を強化する新しいアンサンブルスタッキングフレームワークである EnStack を紹介します。
私たちのアプローチは、意味分析のための CodeBERT、構造表現のための GraphCodeBERT、クロスモーダル機能のための UniXcoder のコード理解に特化した複数の事前トレーニングされた大規模言語モデル (LLM) を相乗効果させます。
Draper VDISC データセット上でこれらのモデルを微調整し、ロジスティック回帰、サポートベクターマシン (SVM)、ランダムフォレスト、XGBoost などのメタ分類器を通じて出力を統合することにより、EnStack は、個々のモデルが見落とす可能性のある複雑なコードパターンと脆弱性を効果的に捕捉します。
。
メタ分類子は各 LLM の長所を統合し、その結果、多様なプログラミングコンテキストにわたる微妙で複雑な脆弱性の検出に優れた包括的なモデルが得られます。
実験結果は、EnStack が既存の手法を大幅に上回っており、精度、精度、再現率、F1 スコアで顕著な改善を達成していることを示しています。
この研究は、コード分析タスクにおけるアンサンブル LLM アプローチの可能性を強調し、自動化された脆弱性検出を進めるための NLP 技術の適用に関する貴重な洞察を提供します。

要約(オリジナル)

Automated detection of software vulnerabilities is critical for enhancing security, yet existing methods often struggle with the complexity and diversity of modern codebases. In this paper, we introduce EnStack, a novel ensemble stacking framework that enhances vulnerability detection using natural language processing (NLP) techniques. Our approach synergizes multiple pre-trained large language models (LLMs) specialized in code understanding CodeBERT for semantic analysis, GraphCodeBERT for structural representation, and UniXcoder for cross-modal capabilities. By fine-tuning these models on the Draper VDISC dataset and integrating their outputs through meta-classifiers such as Logistic Regression, Support Vector Machines (SVM), Random Forest, and XGBoost, EnStack effectively captures intricate code patterns and vulnerabilities that individual models may overlook. The meta-classifiers consolidate the strengths of each LLM, resulting in a comprehensive model that excels in detecting subtle and complex vulnerabilities across diverse programming contexts. Experimental results demonstrate that EnStack significantly outperforms existing methods, achieving notable improvements in accuracy, precision, recall, and F1-score. This work highlights the potential of ensemble LLM approaches in code analysis tasks and offers valuable insights into applying NLP techniques for advancing automated vulnerability detection.

arxiv情報

著者	Shahriyar Zaman Ridoy,Md. Shazzad Hossain Shaon,Alfredo Cuzzocrea,Mst Shapna Akter
発行日	2024-11-25 16:47:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー