Behind the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

要約

小言語モデル（SLM）は、高効率と低い計算コストのために、エッジデバイスの展開でますます顕著になっています。
研究者は、革新的なトレーニング戦略とモデル圧縮技術を通じてSLMの能力を進め続けていますが、SLMのセキュリティリスクは、このギャップを埋めるために、大規模な言語モデル（LLM）と比較してかなり注目されていません。
私たちの実験は、ほとんどのSLMが既存の脱獄攻撃の影響を非常に受けやすいことを示していますが、それらのいくつかは直接的な有害プロンプトに対して脆弱です。安全性の懸念に対処するために、いくつかの代表的な防衛方法を評価し、SLMのセキュリティを強化する効果を実証します。
さらに、アーキテクチャの圧縮、量子化、知識の蒸留などを含むさまざまなSLM技術によって引き起こされる潜在的なセキュリティ劣化を分析します。
私たちの研究は、SLMSのセキュリティの課題を強調し、より堅牢で安全なSLMを開発する将来の仕事に貴重な洞察を提供できると予想しています。

要約(オリジナル)

Small language models (SLMs) have become increasingly prominent in the deployment on edge devices due to their high efficiency and low computational cost. While researchers continue to advance the capabilities of SLMs through innovative training strategies and model compression techniques, the security risks of SLMs have received considerably less attention compared to large language models (LLMs).To fill this gap, we provide a comprehensive empirical study to evaluate the security performance of 13 state-of-the-art SLMs under various jailbreak attacks. Our experiments demonstrate that most SLMs are quite susceptible to existing jailbreak attacks, while some of them are even vulnerable to direct harmful prompts.To address the safety concerns, we evaluate several representative defense methods and demonstrate their effectiveness in enhancing the security of SLMs. We further analyze the potential security degradation caused by different SLM techniques including architecture compression, quantization, knowledge distillation, and so on. We expect that our research can highlight the security challenges of SLMs and provide valuable insights to future work in developing more robust and secure SLMs.

arxiv情報

著者	Sibo Yi,Tianshuo Cong,Xinlei He,Qi Li,Jiaxing Song
発行日	2025-02-28 12:59:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Behind the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー