Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials

要約

Grokking は、遅延一般化の謎を明らかにするために積極的に研究されており、Grokked モデル内の解釈可能な表現とアルゴリズムを特定することは、そのメカニズムを理解するための示唆に富むヒントとなります。
モジュラー加算のグロッキングは、Transformers で三角恒等式のフーリエ表現とその計算回路を実装することが知られています。
モジュラー演算の周期性を考慮すると、当然の疑問は、これらの説明と解釈が、加算以外の他のモジュラー演算の理解にどの程度当てはまるかということです。
詳しく見るために、まず、モジュラー演算は独特のフーリエ表現または内部回路で特徴付けることができ、Grokked モデルは類似の演算間で転送可能な共通の特徴を取得し、同様の演算を含むデータセットを混合すると Grokking が促進されるという仮説を立てます。
次に、多項式を含む複雑なモジュラー算術タスクに関するトランスフォーマーを学習することで、それらを広範囲に調べます。
私たちのフーリエ解析とモジュラー演算の新しい進歩測定、フーリエ周波数密度とフーリエ係数比は、モジュラー演算ごとの Grokked モデルの独特の内部表現を特徴づけます。
たとえば、多項式では初等算術で見られるフーリエ成分の重ね合わせが生じることがよくありますが、難解な因数分解不可能な多項式では明確なパターンは現れません。
対照的に、事前にグロッキングされたモデルに関するアブレーション研究では、各演算でグロッキングされたモデル間の移行可能性は、初歩的な算術演算から一次式などの特定の組み合わせにのみ限定されることが明らかになりました。
さらに、複数のタスクが混在している場合、すべてのタスクで同時にグロッキングが行われる共同グロッキングが発生し、一般化が促進される場合もありますが、最適な解決策が見つからない場合もあります。
内部回路の解釈可能性を実現するための経験的な手順を提供します。

要約(オリジナル)

Grokking has been actively explored to reveal the mystery of delayed generalization and identifying interpretable representations and algorithms inside the grokked models is a suggestive hint to understanding its mechanism. Grokking on modular addition has been known to implement Fourier representation and its calculation circuits with trigonometric identities in Transformers. Considering the periodicity in modular arithmetic, the natural question is to what extent these explanations and interpretations hold for the grokking on other modular operations beyond addition. For a closer look, we first hypothesize that any modular operations can be characterized with distinctive Fourier representation or internal circuits, grokked models obtain common features transferable among similar operations, and mixing datasets with similar operations promotes grokking. Then, we extensively examine them by learning Transformers on complex modular arithmetic tasks, including polynomials. Our Fourier analysis and novel progress measure for modular arithmetic, Fourier Frequency Density and Fourier Coefficient Ratio, characterize distinctive internal representations of grokked models per modular operation; for instance, polynomials often result in the superposition of the Fourier components seen in elementary arithmetic, but clear patterns do not emerge in challenging non-factorizable polynomials. In contrast, our ablation study on the pre-grokked models reveals that the transferability among the models grokked with each operation can be only limited to specific combinations, such as from elementary arithmetic to linear expressions. Moreover, some multi-task mixtures may lead to co-grokking — where grokking simultaneously happens for all the tasks — and accelerate generalization, while others may not find optimal solutions. We provide empirical steps towards the interpretability of internal circuits.

arxiv情報

著者	Hiroki Furuta,Gouki Minegishi,Yusuke Iwasawa,Yutaka Matsuo
発行日	2024-12-30 11:00:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー