Grokking as Compression: A Nonlinear Complexity Perspective

要約

暗記後に一般化が大幅に遅れる現象であるグロッキングは、圧縮が原因であると考えられます。
そのために、ネットワークの複雑さを測定するための線形マッピング番号 (LMN) を定義します。これは、ReLU ネットワークの線形領域番号の一般化されたバージョンです。
LMN は、一般化する前にニューラルネットワークの圧縮を適切に特徴付けることができます。
$L_2$ ノルムはモデルの複雑さを特徴付けるための一般的な選択ですが、私たちはいくつかの理由から LMN を支持します。(1) LMN は情報/計算として自然に解釈できますが、$L_2$ は解釈できません。
(2) 圧縮段階では、LMN はテスト損失と線形の関係を持ちますが、$L_2$ は複雑な非線形の方法でテスト損失と相関します。
(3) LMN は、XOR ネットワークが 2 つの一般化ソリューション間で切り替わるという興味深い現象も明らかにしますが、$L_2$ はそうではありません。
グロッキングの説明に加えて、LMN は現代の人工ニューラルネットワークの性質に合わせたローカルまたは条件付き線形計算を明示的に考慮しているため、コルモゴロフ複雑性のニューラルネットワークバージョンとして有望な候補であると主張します。

要約(オリジナル)

We attribute grokking, the phenomenon where generalization is much delayed after memorization, to compression. To do so, we define linear mapping number (LMN) to measure network complexity, which is a generalized version of linear region number for ReLU networks. LMN can nicely characterize neural network compression before generalization. Although the $L_2$ norm has been a popular choice for characterizing model complexity, we argue in favor of LMN for a number of reasons: (1) LMN can be naturally interpreted as information/computation, while $L_2$ cannot. (2) In the compression phase, LMN has linear relations with test losses, while $L_2$ is correlated with test losses in a complicated nonlinear way. (3) LMN also reveals an intriguing phenomenon of the XOR network switching between two generalization solutions, while $L_2$ does not. Besides explaining grokking, we argue that LMN is a promising candidate as the neural network version of the Kolmogorov complexity since it explicitly considers local or conditioned linear computations aligned with the nature of modern artificial neural networks.

arxiv情報

著者	Ziming Liu,Ziqian Zhong,Max Tegmark
発行日	2023-10-09 17:59:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Grokking as Compression: A Nonlinear Complexity Perspective

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー