Sparse Double Descent: Where Network Pruning Aggravates Overfitting

要約

人々は通常、ネットワークの剪定は深いネットワークの計算コストを削減するだけでなく、モデルの容量を減らすことによって過剰適合を防ぐと信じています。
しかし、驚くべきことに、私たちの仕事は、ネットワークの剪定が過剰適合を悪化させることさえあることを発見しました。
ネットワークの剪定によってモデルのスパース性を高めると、テストのパフォーマンスが最初に悪化し（過剰適合のため）、次に良くなり（過剰適合が緩和されたため）、最後に悪化する（有用な忘却のため）という予期しないスパース二重降下現象を報告します。
情報）。
最近の研究は、モデルの過剰パラメーター化に関して深い二重降下に焦点を合わせていましたが、スパース性も二重降下を引き起こす可能性があることを認識できませんでした。
この論文では、3つの主要な貢献があります。
まず、広範な実験を通じて、新しいスパース二重降下現象を報告します。
次に、この現象について、スパースモデルの$ \ ell_ {2} $学習距離の曲線（初期化されたパラメーターから最終パラメーターまで）がスパース二重降下曲線とよく相関し、一般化をよりよく反映する可能性があるという新しい学習距離の解釈を提案します。
最小平坦度より。
第三に、まばらな二重降下のコンテキストでは、宝くじの仮説の当選チケットが意外にも常に当選するとは限りません。

要約(オリジナル)

People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even aggravates overfitting. We report an unexpected sparse double descent phenomenon that, as we increase model sparsity via network pruning, test performance first gets worse (due to overfitting), then gets better (due to relieved overfitting), and gets worse at last (due to forgetting useful information). While recent studies focused on the deep double descent with respect to model overparameterization, they failed to recognize that sparsity may also cause double descent. In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. Second, for this phenomenon, we propose a novel learning distance interpretation that the curve of $\ell_{2}$ learning distance of sparse models (from initialized parameters to final parameters) may correlate with the sparse double descent curve well and reflect generalization better than minima flatness. Third, in the context of sparse double descent, a winning ticket in the lottery ticket hypothesis surprisingly may not always win.

arxiv情報

著者	Zheng He,Zeke Xie,Quanzhi Zhu,Zengchang Qin
発行日	2022-06-17 11:02:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー