Defeating Prompt Injections by Design

要約

大規模な言語モデル（LLM）は、外部環境と相互作用するエージェントシステムにますます展開されています。
ただし、LLMエージェントは、信頼されていないデータを処理する際の迅速な注入攻撃に対して脆弱です。
この論文では、LLMの周りに保護システム層を作成する堅牢な防御であるCamelを提案し、基礎となるモデルが攻撃の影響を受けやすい場合でも保護します。
操作するために、キャメルは（信頼できる）クエリからコントロールとデータの流れを明示的に抽出します。
したがって、LLMによって取得された信頼されていないデータは、プログラムの流れに決して影響を与えることはありません。
セキュリティをさらに改善するために、Camelは、不正なデータフローよりもプライベートデータの拡張を防ぐ能力の概念に依存しています。
最近のエージェントセキュリティベンチマークであるAgentdojo [Neurips 2024]の証明可能なセキュリティを持つタスクの67ドル\％$を解くことにより、ラクダの有効性を実証します。

要約(オリジナル)

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models may be susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL relies on a notion of a capability to prevent the exfiltration of private data over unauthorized data flows. We demonstrate effectiveness of CaMeL by solving $67\%$ of tasks with provable security in AgentDojo [NeurIPS 2024], a recent agentic security benchmark.

arxiv情報

著者	Edoardo Debenedetti,Ilia Shumailov,Tianqi Fan,Jamie Hayes,Nicholas Carlini,Daniel Fabian,Christoph Kern,Chongyang Shi,Andreas Terzis,Florian Tramèr
発行日	2025-03-24 15:54:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Defeating Prompt Injections by Design

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー