Position: Editing Large Language Models Poses Serious Safety Risks

要約

大規模な言語モデル（LLM）には、世界に関する大量の事実が含まれています。
これらの事実は時間とともに時代遅れになる可能性があり、それが限られた副作用でLLMの特定の事実を変更できる知識編集方法（KES）の開発につながりました。
このポジションペーパーでは、LLMSの編集は、見過ごされている深刻な安全リスクをもたらすと主張しています。
まず、KESが広く利用可能で、計算的に安価で、非常にパフォーマンスが高く、ステルスであるという事実に注意してください。
第二に、KESの悪意のあるユースケースについて説明し、KESをさまざまな悪意のある目的に簡単に適合させる方法を示します。
第三に、AIエコシステムの脆弱性を強調し、検証なしで更新されたモデルの無制限のアップロードとダウンロードを可能にします。
第四に、私たちは、社会的および制度的意識の欠如がこのリスクを悪化させ、異なる利害関係者への影響を議論すると主張します。
コミュニティに、（i）悪意のあるモデルの編集に対する改ざん耐性モデルと対策を調査し、（ii）AIエコシステムの保護に積極的に関与しています。

要約(オリジナル)

Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.

arxiv情報

著者	Paul Youssef,Zhixue Zhao,Daniel Braun,Jörg Schlötterer,Christin Seifert
発行日	2025-06-10 14:34:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Position: Editing Large Language Models Poses Serious Safety Risks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー