Learning to Attribute with Attention

要約

言語モデルによって生成された一連のトークンを考えると、このシーケンスを生成するためにモデルに影響を与える前のトークンを識別することをお勧めします。
このようなトークンの帰属を実行するのは高価です。
一般的なアプローチは、トークンに先行することを除去し、それらの効果を直接測定することです。
トークンの帰属のコストを削減するために、言語モデルが前のトークンをどのように使用するかについてのヒューリスティックとして注意の重みを再検討します。
モデルの動作を注意を払って属性する素朴なアプローチ（たとえば、トークンの影響を推定するために注意ヘッド全体の注意の重みを平均化する）は信頼できないことがわかっています。
忠実な属性を達成するために、さまざまな注意ヘッドの注意力を特徴として扱うことを提案します。
このようにして、属性の注意力を効果的に活用する方法を学ぶことができます（アブレーションからの信号を使用）。
結果として得られる方法、注意付きの帰属（AT2）は、多くのアブレーションを伴うアプローチと同等に確実に実行されますが、大幅に効率的です。
AT2のユーティリティを紹介するために、それを使用して、提供されたコンテキストの重要性の低い部分を質問に応答する設定で剪定し、回答の品質を向上させます。
https://github.com/madrylab/at2でAT2のコードを提供します。

要約(オリジナル)

Given a sequence of tokens generated by a language model, we may want to identify the preceding tokens that influence the model to generate this sequence. Performing such token attribution is expensive; a common approach is to ablate preceding tokens and directly measure their effects. To reduce the cost of token attribution, we revisit attention weights as a heuristic for how a language model uses previous tokens. Naive approaches to attribute model behavior with attention (e.g., averaging attention weights across attention heads to estimate a token’s influence) have been found to be unreliable. To attain faithful attributions, we propose treating the attention weights of different attention heads as features. This way, we can learn how to effectively leverage attention weights for attribution (using signal from ablations). Our resulting method, Attribution with Attention (AT2), reliably performs on par with approaches that involve many ablations, while being significantly more efficient. To showcase the utility of AT2, we use it to prune less important parts of a provided context in a question answering setting, improving answer quality. We provide code for AT2 at https://github.com/MadryLab/AT2 .

arxiv情報

著者	Benjamin Cohen-Wang,Yung-Sung Chuang,Aleksander Madry
発行日	2025-04-18 15:36:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning to Attribute with Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー