Peeking Behind the Curtains of Residual Learning

要約

残差学習の利用は、深くてスケーラブルなニューラルネットワークで広く普及しています。
ただし、残差学習の成功に寄与する基本原理は依然としてとらえどころのないため、深さのスケーラビリティを備えたプレーンネットの効果的なトレーニングが妨げられています。
この論文では、プレーンニューラルネットワークの収束失敗につながる「入力の散逸」現象を明らかにすることで、残差学習のカーテンの裏側を覗いてみます。入力は、非線形性によりプレーン層を通じて徐々に損なわれ、その結果、学習機能に課題が生じます。
表現。
私たちは、単純なニューラルネットが入力をランダムノイズにどのように縮退させるかを理論的に実証し、解決策として生き残るニューロンのより適切な下限を維持する残留接続の重要性を強調します。
私たちは理論的な発見に基づいて、非線形層を横切る内部パスが残差学習の最も重要な部分であることを特定する「プレーンニューラルネット仮説」(PNNH) を提案し、ディーププレーンニューラルネットワークのトレーニングをサポートするパラダイムを確立します。
残りの接続の数。
当社は、一般的なビジョンベンチマークで PNNH 対応の CNN アーキテクチャとトランスフォーマーを徹底的に評価し、ResNet やビジョントランスフォーマーと比較して、同等の精度、最大 0.3% 高いトレーニングスループット、および 2 倍優れたパラメーター効率を示しています。

要約(オリジナル)

The utilization of residual learning has become widespread in deep and scalable neural nets. However, the fundamental principles that contribute to the success of residual learning remain elusive, thus hindering effective training of plain nets with depth scalability. In this paper, we peek behind the curtains of residual learning by uncovering the ‘dissipating inputs’ phenomenon that leads to convergence failure in plain neural nets: the input is gradually compromised through plain layers due to non-linearities, resulting in challenges of learning feature representations. We theoretically demonstrate how plain neural nets degenerate the input to random noise and emphasize the significance of a residual connection that maintains a better lower bound of surviving neurons as a solution. With our theoretical discoveries, we propose ‘The Plain Neural Net Hypothesis’ (PNNH) that identifies the internal path across non-linear layers as the most critical part in residual learning, and establishes a paradigm to support the training of deep plain neural nets devoid of residual connections. We thoroughly evaluate PNNH-enabled CNN architectures and Transformers on popular vision benchmarks, showing on-par accuracy, up to 0.3% higher training throughput, and 2x better parameter efficiency compared to ResNets and vision Transformers.

arxiv情報

著者	Tunhou Zhang,Feng Yan,Hai Li,Yiran Chen
発行日	2024-02-13 18:24:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Peeking Behind the Curtains of Residual Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー