Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

要約

言語モデル (LM) は、トレーニングプロセスからパラメトリックな知識を取得し、それを重みの中に埋め込みます。
ただし、LM のスケーラビリティの向上により、モデルの内部動作を理解すること、さらには、再トレーニングに多大なコストをかけずにこの埋め込まれた知識を更新または修正することに関して、大きな課題が生じています。
これは、どのような知識が保存されているか、およびその知識と特定のモデルコンポーネントとの関連性を正確に明らかにすることの重要性を強調しています。
インスタンスアトリビューション (IA) とニューロンアトリビューション (NA) は、トレーニングによって得られた知識についての洞察を提供しますが、体系的に比較されていません。
私たちの研究では、IAとNAによって明らかにされた知識を定量化し、比較するための新しい評価フレームワークを導入しています。
メソッドの結果を調整するために、影響力のあるトレーニングインスタンスを取得するために NA を適用するアトリビューションメソッド NA-Instances と、IA によって発見された影響力のあるインスタンスの重要なニューロンを発見する IA-Neurons を導入します。
さらに、両方の方法によって提供される説明の包括性と十分性を評価するための忠実性テストの包括的なリストを提案します。
広範な実験と分析を通じて、NA は一般に IA と比較して LM のパラメトリック知識に関してより多様かつ包括的な情報を明らかにすることを実証しました。
それにもかかわらず、IA は、NA では明らかにされない、LM のパラメトリック知識に対するユニークで貴重な洞察を提供します。
我々の発見はさらに、LM のパラメトリック知識をより全体的に理解するために、IA と NA の多様な発見を組み合わせる相乗的アプローチの可能性を示唆しています。

要約(オリジナル)

Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model’s inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components. Instance Attribution (IA) and Neuron Attribution (NA) offer insights into this training-acquired knowledge, though they have not been compared systematically. Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. To align the results of the methods we introduce the attribution method NA-Instances to apply NA for retrieving influential training instances, and IA-Neurons to discover important neurons of influential instances discovered by IA. We further propose a comprehensive list of faithfulness tests to evaluate the comprehensiveness and sufficiency of the explanations provided by both methods. Through extensive experiments and analysis, we demonstrate that NA generally reveals more diverse and comprehensive information regarding the LM’s parametric knowledge compared to IA. Nevertheless, IA provides unique and valuable insights into the LM’s parametric knowledge, which are not revealed by NA. Our findings further suggest the potential of a synergistic approach of combining the diverse findings of IA and NA for a more holistic understanding of an LM’s parametric knowledge.

arxiv情報

著者	Haeun Yu,Pepa Atanasova,Isabelle Augenstein
発行日	2024-04-29 12:38:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー