Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

要約

大規模言語モデル（Large Language Models：LLM）を搭載したLLMベースのエージェントは、複雑な実世界のタスクを解決するために外部ツールやメモリ機構を使用することができますが、セキュリティ上の重大な脆弱性をもたらす可能性もあります。しかし、既存の文献では、LLMベースのエージェントに対する攻撃や防御を包括的に評価していません。これを解決するために、我々は、LLMベースのエージェントの攻撃と防御を形式化し、ベンチマークし、評価するために設計された包括的なフレームワークであるエージェントセキュリティベンチ（ASB）を紹介します。このフレームワークには、10のシナリオ（例えば、電子商取引、自律走行、金融）、シナリオをターゲットとする10のエージェント、400以上のツール、23の異なるタイプの攻撃/防御手法、8つの評価指標が含まれます。ASBに基づき、10種類のプロンプトインジェクション攻撃、メモリポイズニング攻撃、新奇なPlan-of-Thoughtバックドア攻撃、混合攻撃、および13のLLMバックボーンに対応する10種類の防御を、合計約90,000のテストケースでベンチマークしました。我々のベンチマークの結果は、システム・プロンプト、ユーザ・プロンプト処理、ツール使用、メモリ検索を含むエージェント操作の異なる段階における重大な脆弱性を明らかにし、84.30%の最高の平均攻撃成功率を示したが、現在の防御では限られた有効性しか示さず、コミュニティのためにエージェント・セキュリティの観点で行うべき重要な作業を明らかにした。我々のコードはhttps://github.com/agiresearch/ASB。

要約(オリジナル)

Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents, including 10 scenarios (e.g., e-commerce, autonomous driving, finance), 10 agents targeting the scenarios, over 400 tools, 23 different types of attack/defense methods, and 8 evaluation metrics. Based on ASB, we benchmark 10 prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought backdoor attack, a mixed attack, and 10 corresponding defenses across 13 LLM backbones with nearly 90,000 testing cases in total. Our benchmark results reveal critical vulnerabilities in different stages of agent operation, including system prompt, user prompt handling, tool usage, and memory retrieval, with the highest average attack success rate of 84.30\%, but limited effectiveness shown in current defenses, unveiling important works to be done in terms of agent security for the community. Our code can be found at https://github.com/agiresearch/ASB.

arxiv情報

著者	Hanrong Zhang,Jingyuan Huang,Kai Mei,Yifei Yao,Zhenting Wang,Chenlu Zhan,Hongwei Wang,Yongfeng Zhang
発行日	2024-10-03 16:30:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー