A Practical Memory Injection Attack against LLM Agents

要約

大規模な言語モデル（LLMS）に基づくエージェントは、幅広い複雑で実世界のアプリケーションで強力な能力を実証しています。
ただし、メモリバンクが侵害されたLLMエージェントは、デモのために取得した過去の記録が悪意がある場合、有害な出力を簡単に生成する可能性があります。
この論文では、クエリと出力観測を介してエージェントと対話することにより、悪意のある記録をメモリバンクに注入できるようにする新しいメモリインジェクション攻撃であるMinjaを提案します。
これらの悪意のあるレコードは、被害者ユーザーのクエリを実行するときに望ましくないエージェントアクションにつながる一連の悪意のある推論ステップを引き出すように設計されています。
具体的には、被害者のクエリを悪意のある推論ステップにリンクするために、一連のブリッジング手順を紹介します。
悪意のある記録の注入中に、設計されたブリッジング手順を自律的に生成するようにエージェントを導くための表示プロンプトを提案します。
また、犠牲者のクエリを処理するときに悪意のある記録が簡単に取得されるように、適応プロンプトを徐々に削除するプログレッシブショートニング戦略を提案します。
多様なエージェント全体の広範な実験は、エージェントメモリの妥協におけるMinjaの有効性を示しています。
実行に関する要件が最小限であるため、Minjaはすべてのユーザーがエージェントメモリに影響を与え、LLMエージェントの実際的なリスクを強調できるようにします。

要約(オリジナル)

Agents based on large language models (LLMs) have demonstrated strong capabilities in a wide range of complex, real-world applications. However, LLM agents with a compromised memory bank may easily produce harmful outputs when the past records retrieved for demonstration are malicious. In this paper, we propose a novel Memory INJection Attack, MINJA, that enables the injection of malicious records into the memory bank by only interacting with the agent via queries and output observations. These malicious records are designed to elicit a sequence of malicious reasoning steps leading to undesirable agent actions when executing the victim user’s query. Specifically, we introduce a sequence of bridging steps to link the victim query to the malicious reasoning steps. During the injection of the malicious record, we propose an indication prompt to guide the agent to autonomously generate our designed bridging steps. We also propose a progressive shortening strategy that gradually removes the indication prompt, such that the malicious record will be easily retrieved when processing the victim query comes after. Our extensive experiments across diverse agents demonstrate the effectiveness of MINJA in compromising agent memory. With minimal requirements for execution, MINJA enables any user to influence agent memory, highlighting practical risks of LLM agents.

arxiv情報

著者	Shen Dong,Shaocheng Xu,Pengfei He,Yige Li,Jiliang Tang,Tianming Liu,Hui Liu,Zhen Xiang
発行日	2025-03-05 17:53:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Practical Memory Injection Attack against LLM Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー