ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models

要約

勾配や注意などの特徴帰属手法 (FA) は、モデル予測に対するすべての入力特徴の重要性を導き出すために広く採用されているアプローチです。
自然言語処理における既存の研究は、分類タスクにおけるエンコーダ専用言語モデル (LM) の FA の開発とテストに主に焦点を当ててきました。
ただし、モデルアーキテクチャとタスク設定のそれぞれに固有の違いがあるため、テキスト生成のデコーダー専用モデルにこれらの FA を忠実に使用できるかどうかは不明です。
さらに、これまでの研究では、モデルやタスクにまたがる「1 つですべてに勝つ」FA は存在しないことが実証されています。
これにより、入力重要度の導出には、大規模な計算にアクセスしたとしても法外な可能性がある勾配計算を含む複数の前方パスと後方パスが必要となるため、大規模な LM にとって FA の選択は計算コストが高くなります。
これらの問題に対処するために、Recursive Attribution Generator (ReAGent) と呼ばれる生成 LM 用のモデルに依存しない FA を紹介します。
私たちの方法では、トークンの重要度の分布を再帰的に更新します。
更新ごとに、元の入力を使用した場合と、入力の一部が RoBERTa 予測に置き換えられた修正バージョンを使用した場合との間で、次のトークンを予測するための語彙にわたる確率分布の差を計算します。
私たちの直観では、コンテキスト内の重要なトークンを置き換えると、重要でないトークンを置き換える場合よりも、トークンの予測におけるモデルの信頼度が大きく変化するはずです。
私たちの方法は、他のほとんどの FA が必要とする内部モデルの重みや追加のトレーニングや微調整にアクセスすることなく、あらゆる生成 LM に普遍的に適用できます。
私たちは、ReAGent の忠実性を、さまざまなサイズの 6 つのデコーダ専用 LM にわたる 7 つの人気のある FA と徹底的に比較します。
結果は、私たちの方法が一貫してより忠実なトークン重要度分布を提供することを示しています。

要約(オリジナル)

Feature attribution methods (FAs), such as gradients and attention, are widely employed approaches to derive the importance of all input features to the model predictions. Existing work in natural language processing has mostly focused on developing and testing FAs for encoder-only language models (LMs) in classification tasks. However, it is unknown if it is faithful to use these FAs for decoder-only models on text generation, due to the inherent differences between model architectures and task settings respectively. Moreover, previous work has demonstrated that there is no `one-wins-all’ FA across models and tasks. This makes the selection of a FA computationally expensive for large LMs since input importance derivation often requires multiple forward and backward passes including gradient computations that might be prohibitive even with access to large compute. To address these issues, we present a model-agnostic FA for generative LMs called Recursive Attribution Generator (ReAGent). Our method updates the token importance distribution in a recursive manner. For each update, we compute the difference in the probability distribution over the vocabulary for predicting the next token between using the original input and using a modified version where a part of the input is replaced with RoBERTa predictions. Our intuition is that replacing an important token in the context should have resulted in a larger change in the model’s confidence in predicting the token than replacing an unimportant token. Our method can be universally applied to any generative LM without accessing internal model weights or additional training and fine-tuning, as most other FAs require. We extensively compare the faithfulness of ReAGent with seven popular FAs across six decoder-only LMs of various sizes. The results show that our method consistently provides more faithful token importance distributions.

arxiv情報

著者	Zhixue Zhao,Boxuan Shan
発行日	2024-02-01 17:25:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー