Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

要約

二重降下は、機械学習における驚くべき現象であり、モデルパラメーターの数がデータの数に対して相対的に増加するにつれて、モデルが大きくなり、高度にオーバーパラメーター化された (データがアンダーサンプリングされた) 体制になるにつれて、テストエラーが減少します。
このテストエラーの減少は、オーバーフィッティングに関する古典的な学習理論に反しており、機械学習における大規模モデルの成功を支えてきたことは間違いありません。
テスト損失のこの非単調な動作は、データの数、データの次元、およびモデルパラメーターの数に依存します。
ここでは、二重降下について簡単に説明し、線形代数と初歩的な確率に精通していれば、非公式で親しみやすい方法で二重降下が発生する理由を説明します。
多項式回帰を使用して視覚的な直感を提供し、通常の線形回帰を使用して二重降下を数学的に分析し、すべてが同時に存在する場合に二重降下を作成する 3 つの解釈可能な要因を特定します。
通常の線形回帰を使用すると、実際のデータで二重降下が発生することを示し、次に、3 つの要因のいずれかが除去された場合に二重降下が発生しないことを示します。
この理解を使用して、重ね合わせと二重降下に関する非線形モデルでの最近の観測に光を当てます。
コードは公開されています。

要約(オリジナル)

Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.

arxiv情報

著者	Rylan Schaeffer,Mikail Khona,Zachary Robertson,Akhilan Boopathy,Kateryna Pistunova,Jason W. Rocks,Ila Rani Fiete,Oluwasanmi Koyejo
発行日	2023-03-24 17:03:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー