Why do CNNs Learn Consistent Representations in their First Layer Independent of Labels and Architecture?

要約

CNNの第1層で学習されるフィルターは、異なるネットワークやタスクに対して質的に類似していることが以前から観察されている。我々はこの知見を拡張し、異なるネットワークで学習されたフィルタ間の高い定量的類似性を示す。我々はCNNのフィルターをフィルターバンクとみなし、異なる周波数に対するフィルターバンクの感度を測定する。我々は、異なるネットワークの感度プロファイルが、初期化からは程遠いものの、ほぼ同一であることを示す。驚くべきことに、我々はネットワークがランダムなラベルで訓練された場合でも、それが同じままであることを示す。この効果を理解するために、我々は線形CNNの第一層のフィルタの感度の解析式を導出する。2つのクラスの画像の平均パッチが同一であるとき、第1層のフィルタの感度プロファイルは、真のラベルを用いてもランダムなラベルを用いても期待通りに同一となり、画像パッチの2次統計量にのみ依存することを証明する。我々は、この平均パッチの仮定が現実的なデータセットで成立することを経験的に実証する。最後に、非線形CNNのフィルタのエネルギープロファイルは線形CNNのエネルギープロファイルと高い相関があること、線形ネットワークの分析により、ベンチマーク分類タスクで学習した最先端のネットワークが学習した表現がいつラベルに依存するかを予測することができることを示す。

要約(オリジナル)

It has previously been observed that the filters learned in the first layer of a CNN are qualitatively similar for different networks and tasks. We extend this finding and show a high quantitative similarity between filters learned by different networks. We consider the CNN filters as a filter bank and measure the sensitivity of the filter bank to different frequencies. We show that the sensitivity profile of different networks is almost identical, yet far from initialization. Remarkably, we show that it remains the same even when the network is trained with random labels. To understand this effect, we derive an analytic formula for the sensitivity of the filters in the first layer of a linear CNN. We prove that when the average patch in images of the two classes is identical, the sensitivity profile of the filters in the first layer will be identical in expectation when using the true labels or random labels and will only depend on the second-order statistics of image patches. We empirically demonstrate that the average patch assumption holds for realistic datasets. Finally we show that the energy profile of filters in nonlinear CNNs is highly correlated with the energy profile of linear CNNs and that our analysis of linear networks allows us to predict when representations learned by state-of-the-art networks trained on benchmark classification tasks will depend on the labels.

arxiv情報

著者	Rhea Chowers,Yair Weiss
発行日	2022-06-06 09:33:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Why do CNNs Learn Consistent Representations in their First Layer Independent of Labels and Architecture?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー