Attack and defense techniques in large language models: A survey and new perspectives

要約

大規模言語モデル（LLM）は、多くの自然言語処理タスクの中心的存在となっているが、その脆弱性は、セキュリティと倫理面で大きな課題となっている。この体系的な調査では、LLMにおける攻撃と防御のテクニックの進化を探る。攻撃を、敵対的プロンプト攻撃、最適化攻撃、モデル盗用、およびLLMのアプリケーションに対する攻撃に分類し、そのメカニズムと意味を詳述する。その結果、予防に基づく防御方法と検出に基づく防御方法を含む防御戦略を分析する。進歩は見られるものの、動的な脅威の状況に適応すること、ユーザビリティと堅牢性のバランスをとること、防御の実装におけるリソースの制約に対処することなどの課題が残されている。我々は、適応性のあるスケーラブルな防御、説明可能なセキュリティ技術、標準化された評価フレームワークの必要性など、未解決の問題を強調する。本調査は、安全で回復力のあるLLMを開発するための実用的な洞察と方向性を提供し、実世界のアプリケーションにおけるリスクを軽減するための学際的な協力と倫理的配慮の重要性を強調する。

要約(オリジナル)

Large Language Models (LLMs) have become central to numerous natural language processing tasks, but their vulnerabilities present significant security and ethical challenges. This systematic survey explores the evolving landscape of attack and defense techniques in LLMs. We classify attacks into adversarial prompt attack, optimized attacks, model theft, as well as attacks on application of LLMs, detailing their mechanisms and implications. Consequently, we analyze defense strategies, including prevention-based and detection-based defense methods. Although advances have been made, challenges remain to adapt to the dynamic threat landscape, balance usability with robustness, and address resource constraints in defense implementation. We highlight open problems, including the need for adaptive scalable defenses, explainable security techniques, and standardized evaluation frameworks. This survey provides actionable insights and directions for developing secure and resilient LLMs, emphasizing the importance of interdisciplinary collaboration and ethical considerations to mitigate risks in real-world applications.

arxiv情報

著者	Zhiyu Liao,Kang Chen,Yuanguo Lin,Kangkang Li,Yunxuan Liu,Hefeng Chen,Xingwang Huang,Yuanhui Yu
発行日	2025-05-02 03:37:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Attack and defense techniques in large language models: A survey and new perspectives

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー