Neighboring Perturbations of Knowledge Editing on Large Language Models

要約

大規模言語モデル (LLM) は、その優れた機能にもかかわらず、誤った知識や古い知識により意図しないテキストを生成する傾向があります。
LLM の再トレーニングにはリソースを大量に消費する性質があるため、ナレッジ編集の開発が著しく増加しています。
しかし、現在のアプローチと評価では、隣接する知識に対する編集の混乱を調査することはほとんどありません。
この論文では、LLM への新しい知識の更新が、LLM 内にカプセル化された隣接する知識を混乱させるかどうかを研究します。
具体的には、事実に基づく質問に対する回答リストに新しい回答を追加すると、リスト内の元の正解が壊滅的に忘れられたり、誤った回答が意図せず含まれたりすることにつながるかどうかを解明しようとしています。
加法性の指標が導入され、新しい知識を追加する際の隣接する知識への摂動の程度を評価するために、追加知識の摂動評価 (PEAK) と呼ばれるベンチマークが構築されます。
さらに、回答リストの整合性を維持することで隣接する摂動を軽減するために、Appending via Preservation and Prevention (APP) と呼ばれるプラグアンドプレイフレームワークが提案されています。
実験では、3 つの LLM で 4 つの編集方法と組み合わせた APP の有効性を実証しています。

要約(オリジナル)

Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge. Given the resource-intensive nature of retraining LLMs, there has been a notable increase in the development of knowledge editing. However, current approaches and evaluations rarely explore the perturbation of editing on neighboring knowledge. This paper studies whether updating new knowledge to LLMs perturbs the neighboring knowledge encapsulated within them. Specifically, we seek to figure out whether appending a new answer into an answer list to a factual question leads to catastrophic forgetting of original correct answers in this list, as well as unintentional inclusion of incorrect answers. A metric of additivity is introduced and a benchmark dubbed as Perturbation Evaluation of Appending Knowledge (PEAK) is constructed to evaluate the degree of perturbation to neighboring knowledge when appending new knowledge. Besides, a plug-and-play framework termed Appending via Preservation and Prevention (APP) is proposed to mitigate the neighboring perturbation by maintaining the integrity of the answer list. Experiments demonstrate the effectiveness of APP coupling with four editing methods on three LLMs.

arxiv情報

著者	Jun-Yu Ma,Jia-Chen Gu,Ningyu Zhang,Zhen-Hua Ling
発行日	2024-01-31 06:49:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Neighboring Perturbations of Knowledge Editing on Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー