Neural Redshift: Random Networks are not Random Functions

要約

ニューラルネットワーク (NN) の汎化機能についての理解はまだ不完全です。
一般的な説明は勾配降下法 (GD) の暗黙的なバイアスに基づいていますが、勾配なしの手法によるモデルの機能や、トレーニングされていないネットワークで最近観察された単純性バイアスを説明することはできません。
この論文では、NN における一般化の他の情報源を探します。
調査結果。
GD とは独立してアーキテクチャによって提供される誘導バイアスを理解するために、トレーニングされていないランダムな重みのネットワークを調べます。
単純な MLP でも強い帰納的バイアスが見られます。重み空間で均一なサンプリングを行うと、複雑さの点で非常に偏った関数の分布が生じます。
しかし、一般通念とは異なり、NN には固有の「単純さバイアス」がありません。
このプロパティは、ReLU、残留接続、層の正規化などのコンポーネントに依存します。
代替アーキテクチャは、あらゆる複雑さのレベルに応じてバイアスをかけて構築できます。
トランスフォーマーは、これらすべてのプロパティをその構成要素から継承します。
含意。
勾配ベースのトレーニングに依存しない深層学習の成功について、新たな説明を提供します。
これは、トレーニングされたモデルによって実装されるソリューションを制御するための有望な手段を示しています。

要約(オリジナル)

Our understanding of the generalization capabilities of neural networks (NNs) is still incomplete. Prevailing explanations are based on implicit biases of gradient descent (GD) but they cannot account for the capabilities of models from gradient-free methods nor the simplicity bias recently observed in untrained networks. This paper seeks other sources of generalization in NNs. Findings. To understand the inductive biases provided by architectures independently from GD, we examine untrained, random-weight networks. Even simple MLPs show strong inductive biases: uniform sampling in weight space yields a very biased distribution of functions in terms of complexity. But unlike common wisdom, NNs do not have an inherent ‘simplicity bias’. This property depends on components such as ReLUs, residual connections, and layer normalizations. Alternative architectures can be built with a bias for any level of complexity. Transformers also inherit all these properties from their building blocks. Implications. We provide a fresh explanation for the success of deep learning independent from gradient-based training. It points at promising avenues for controlling the solutions implemented by trained models.

arxiv情報

著者	Damien Teney,Armand Nicolicioiu,Valentin Hartmann,Ehsan Abbasnejad
発行日	2024-03-05 11:43:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Neural Redshift: Random Networks are not Random Functions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー