Sparse Attention Decomposition Applied to Circuit Tracing

要約

多くの論文は、アテンションヘッドが互いに連携して複雑なタスクを実行することを示しています。
アテンションヘッド間の通信は、トークン残差への特定の機能の追加を介して行われることがよく想定されます。
この研究では、GPT-2 small のアテンションヘッド間の通信と調整に影響を与えるために使用される機能を分離して特定することを目指しています。
この問題に対する私たちの主な活用方法は、これらの特徴が注目ヘッド行列の特異ベクトル内でまばらにコード化されることが非常に多いことを示すことです。
間接物体識別 (IOI) タスクに使用した場合の GPT-2 Small のアテンションヘッド全体にわたるこれらの信号の次元と発生を特徴付けます。
アテンションヘッド特異ベクトルによって提供される信号のスパースエンコーディングにより、残留バックグラウンドから信号を効率的に分離し、アテンションヘッド間の通信パスを簡単に識別できます。
IOI タスクで使用される回路の一部を追跡することで、このアプローチの有効性を調査します。
私たちのトレースは、以前の研究には存在しなかったかなりの詳細を明らかにし、GPT-2 に存在する冗長パスの性質を明らかにします。
また、私たちのトレースは、IOI の実行時にアテンションヘッド間の通信に使用される特徴を特定することで、以前の研究を超えています。

要約(オリジナル)

Many papers have shown that attention heads work in conjunction with each other to perform complex tasks. It’s frequently assumed that communication between attention heads is via the addition of specific features to token residuals. In this work we seek to isolate and identify the features used to effect communication and coordination among attention heads in GPT-2 small. Our key leverage on the problem is to show that these features are very often sparsely coded in the singular vectors of attention head matrices. We characterize the dimensionality and occurrence of these signals across the attention heads in GPT-2 small when used for the Indirect Object Identification (IOI) task. The sparse encoding of signals, as provided by attention head singular vectors, allows for efficient separation of signals from the residual background and straightforward identification of communication paths between attention heads. We explore the effectiveness of this approach by tracing portions of the circuits used in the IOI task. Our traces reveal considerable detail not present in previous studies, shedding light on the nature of redundant paths present in GPT-2. And our traces go beyond previous work by identifying features used to communicate between attention heads when performing IOI.

arxiv情報

著者	Gabriel Franco,Mark Crovella
発行日	2024-10-10 16:03:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sparse Attention Decomposition Applied to Circuit Tracing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー