Group Crosscoders for Mechanistic Analysis of Symmetry

要約

ニューラルネットワーク内の対称的な特徴を系統的に検出して分析するクロスコーダーの拡張機能であるグループクロスコーダーを紹介します。
ニューラルネットワークは多くの場合、明示的なアーキテクチャ上の制約なしで等変表現を開発しますが、こうした新たな対称性の理解は従来、手動分析に依存していました。
グループクロスコーダーは、対称グループの下で入力の変換されたバージョン全体で辞書学習を実行することで、このプロセスを自動化します。
二面体群 $\mathrm{D}_{32}$ を使用して InceptionV1 の混合 3b 層に適用すると、私たちの方法はいくつかの重要な洞察を明らかにします。まず、以前に仮説が立てられた特徴タイプに対応する解釈可能なファミリーに特徴を自然にクラスター化し、以前よりも正確に分離します。
標準のスパースオートエンコーダ。
第 2 に、変換ブロック分析により、フィーチャの対称性の自動特性評価が可能になり、さまざまな幾何学的フィーチャ (曲線と直線など) がどのように異なる不変性と等価性のパターンを示すかを明らかにできます。
これらの結果は、グループクロスコーダーがニューラルネットワークが対称性をどのように表現するかについて体系的な洞察を提供できることを示しており、機構の解釈可能性のための有望な新しいツールを提供します。

要約(オリジナル)

We introduce group crosscoders, an extension of crosscoders that systematically discover and analyse symmetrical features in neural networks. While neural networks often develop equivariant representations without explicit architectural constraints, understanding these emergent symmetries has traditionally relied on manual analysis. Group crosscoders automate this process by performing dictionary learning across transformed versions of inputs under a symmetry group. Applied to InceptionV1’s mixed3b layer using the dihedral group $\mathrm{D}_{32}$, our method reveals several key insights: First, it naturally clusters features into interpretable families that correspond to previously hypothesised feature types, providing more precise separation than standard sparse autoencoders. Second, our transform block analysis enables the automatic characterisation of feature symmetries, revealing how different geometric features (such as curves versus lines) exhibit distinct patterns of invariance and equivariance. These results demonstrate that group crosscoders can provide systematic insights into how neural networks represent symmetry, offering a promising new tool for mechanistic interpretability.

arxiv情報

著者	Liv Gorton
発行日	2024-10-31 17:47:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Group Crosscoders for Mechanistic Analysis of Symmetry

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー