Massive Editing for Large Language Models via Meta Learning

要約

大規模言語モデル (LLM) により、トレーニング前のコーパスから知識を学習できるようになりましたが、取得した知識は時間の経過とともに根本的に間違っているか、古くなっている可能性があるため、トレーニング後に言語モデル (LM) の知識を修正する必要があります。
パラメータシフトを生成するためにハイパーネットワークを採用するアプローチが有望であるが、既存のハイパーネットワークでは同期編集操作量の拡張性が劣るという問題がある。
この問題を軽減するために、パラメータシフト集計を最小二乗問題として定式化し、その後正規方程式を使用して LM パラメータを更新する MAssive Language Model Editing Network (MALMEN) を提案します。
限られたメモリ予算で複数のファクトを同時に編集できるようにするために、ハイパーネットワークと LM での計算を分離し、両方のニューラルネットワークで任意のバッチサイズを有効にします。
私たちの手法は、知識集約型のさまざまな NLP タスクにわたって、異なるアーキテクチャ (BERT ベース、GPT-2、T5-XL (2.8B)、GPT-J (6B) など) の LM 上で最大数千のファクトを編集することによって評価されます。
つまり、クローズドブックの事実確認と質問への回答です。
驚くべきことに、MALMEN は、同一のハイパーネットワークアーキテクチャを使用して、強力なベースラインの数百倍のファクトを編集でき、GPT 用に特別に設計されたエディタよりも優れたパフォーマンスを発揮します。
私たちのコードは https://github.com/ChenmienTan/malmen で入手できます。

要約(オリジナル)

While large language models (LLMs) have enabled learning knowledge from the pre-training corpora, the acquired knowledge may be fundamentally incorrect or outdated over time, which necessitates rectifying the knowledge of the language model (LM) after the training. A promising approach involves employing a hyper-network to generate parameter shift, whereas existing hyper-networks suffer from inferior scalability in synchronous editing operation amount. To mitigate the problem, we propose the MAssive Language Model Editing Network (MALMEN), which formulates the parameter shift aggregation as the least square problem, subsequently updating the LM parameters using the normal equation. To accommodate editing multiple facts simultaneously with limited memory budgets, we separate the computation on the hyper-network and LM, enabling arbitrary batch size on both neural networks. Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5-XL (2.8B), and GPT-J (6B), across various knowledge-intensive NLP tasks, i.e., closed book fact-checking and question answering. Remarkably, MALMEN is capable of editing hundreds of times more facts than strong baselines with the identical hyper-network architecture and outperforms editor specifically designed for GPT. Our code is available at https://github.com/ChenmienTan/malmen.

arxiv情報

著者	Chenmien Tan,Ge Zhang,Jie Fu
発行日	2023-11-09 11:07:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Massive Editing for Large Language Models via Meta Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー