SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security

要約

このペーパーでは、コンピューターセキュリティの領域における大規模言語モデル (LLM) のパフォーマンスを評価するために調整された新しいデータセットである SecQA を紹介します。
SecQA は、「Computer Systems Security: Planning for Success」教科書に基づいて GPT-4 によって生成された多肢選択式の質問を利用して、LLM のセキュリティ原則の理解と適用を評価することを目的としています。
さまざまな難易度レベルにわたって簡潔な評価を提供するために、複雑さが増す 2 つのバージョンを含む SecQA の構造と意図について詳しく説明します。
さらに、0 ショットと 5 ショットの両方の学習設定を使用した、GPT-3.5-Turbo、GPT-4、Llama-2、Vicuna、Mistral、Zephyr モデルなどの著名な LLM の広範な評価を示します。
SecQA v1 および v2 データセットにカプセル化された結果は、コンピューターセキュリティのコンテキストにおけるこれらのモデルのさまざまな機能と制限を浮き彫りにします。
この研究は、セキュリティ関連のコンテンツを理解する際の LLM の現状についての洞察を提供するだけでなく、この重要な研究分野における将来の進歩のベンチマークとしての SecQA を確立します。

要約(オリジナル)

In this paper, we introduce SecQA, a novel dataset tailored for evaluating the performance of Large Language Models (LLMs) in the domain of computer security. Utilizing multiple-choice questions generated by GPT-4 based on the ‘Computer Systems Security: Planning for Success’ textbook, SecQA aims to assess LLMs’ understanding and application of security principles. We detail the structure and intent of SecQA, which includes two versions of increasing complexity, to provide a concise evaluation across various difficulty levels. Additionally, we present an extensive evaluation of prominent LLMs, including GPT-3.5-Turbo, GPT-4, Llama-2, Vicuna, Mistral, and Zephyr models, using both 0-shot and 5-shot learning settings. Our results, encapsulated in the SecQA v1 and v2 datasets, highlight the varying capabilities and limitations of these models in the computer security context. This study not only offers insights into the current state of LLMs in understanding security-related content but also establishes SecQA as a benchmark for future advancements in this critical research area.

arxiv情報

著者	Zefang Liu
発行日	2023-12-26 00:59:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー