Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models

要約

大規模な言語モデル（LLM）に基づいた検索された生成（RAG）システムは、質問応答やコンテンツ生成などのタスクに不可欠になっています。
しかし、世論と情報の普及に対する彼らの影響の増加は、固有の脆弱性のためにセキュリティ研究の重要な焦点となっています。
以前の研究では、事実または単一の操作を対象とした攻撃を主に取り上げています。
このホワイトペーパーでは、より実用的なシナリオに取り組んでいます。トピック指向の敵対意見操作攻撃は、LLMが複数の視点を推論して統合するために必要であり、特に体系的な知識中毒を受けやすくなります。
具体的には、関連するクエリ全体で意見に影響を与えるために敵対的な摂動を戦略的に作成する2段階の操作攻撃パイプラインであるトピックフリプラグを提案します。
このアプローチは、従来の敵対ランキング攻撃技術を組み合わせて、LLMの広範な内部関連知識と推論能力を活用して、セマンティックレベルの摂動を実行します。
実験は、提案された攻撃がモデルの出力の意見を特定のトピックに効果的に変化させ、ユーザー情報の認識に大きく影響することを示しています。
現在の緩和方法は、そのような攻撃から効果的に防御することはできず、RAGシステムの強化された保護手段の必要性を強調し、LLMセキュリティ研究の重要な洞察を提供します。

要約(オリジナル)

Retrieval-Augmented Generation (RAG) systems based on Large Language Models (LLMs) have become essential for tasks such as question answering and content generation. However, their increasing impact on public opinion and information dissemination has made them a critical focus for security research due to inherent vulnerabilities. Previous studies have predominantly addressed attacks targeting factual or single-query manipulations. In this paper, we address a more practical scenario: topic-oriented adversarial opinion manipulation attacks on RAG models, where LLMs are required to reason and synthesize multiple perspectives, rendering them particularly susceptible to systematic knowledge poisoning. Specifically, we propose Topic-FlipRAG, a two-stage manipulation attack pipeline that strategically crafts adversarial perturbations to influence opinions across related queries. This approach combines traditional adversarial ranking attack techniques and leverages the extensive internal relevant knowledge and reasoning capabilities of LLMs to execute semantic-level perturbations. Experiments show that the proposed attacks effectively shift the opinion of the model’s outputs on specific topics, significantly impacting user information perception. Current mitigation methods cannot effectively defend against such attacks, highlighting the necessity for enhanced safeguards for RAG systems, and offering crucial insights for LLM security research.

arxiv情報

著者	Yuyang Gong,Zhuo Chen,Miaokun Chen,Fengchang Yu,Wei Lu,Xiaofeng Wang,Xiaozhong Liu,Jiawei Liu
発行日	2025-02-25 14:57:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー