Over-training with Mixup May Hurt Generalization

要約

ランダムなサンプルペアを線形補間することによって合成トレーニングインスタンスを作成する Mixup は、SGD でトレーニングされたディープモデルのパフォーマンスを向上させるためのシンプルでありながら効果的な正則化手法です。
この作業では、Mixup トレーニングでこれまで観察されなかった現象を報告します。多くの標準データセットでは、Mixup トレーニングモデルのパフォーマンスは、多数のエポックのトレーニング後に減衰し始め、U 字型の一般化曲線が生じます。
元のデータセットのサイズが縮小されると、この動作はさらに悪化します。
このような Mixup の動作を理解するために、Mixup トレーニングが望ましくないデータ依存のラベルノイズを合成データに導入する可能性があることを理論的に示します。
ランダムな特徴モデルを使用して最小二乗回帰問題を分析することにより、ノイズの多いラベルが U 字型の曲線を発生させる理由を説明します。Mixup は、トレーニングの初期段階でクリーンなパターンをフィッティングすることで一般化を改善しますが、トレーニングが進むにつれて、Mixup は次のようになります。
合成データのノイズへの過適合。
さまざまなベンチマークデータセットに対して広範な実験が行われ、この説明が検証されます。

要約(オリジナル)

Mixup, which creates synthetic training instances by linearly interpolating random sample pairs, is a simple and yet effective regularization technique to boost the performance of deep models trained with SGD. In this work, we report a previously unobserved phenomenon in Mixup training: on a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. This behavior is further aggravated when the size of original dataset is reduced. To help understand such a behavior of Mixup, we show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data. Via analyzing a least-square regression problem with a random feature model, we explain why noisy labels may cause the U-shaped curve to occur: Mixup improves generalization through fitting the clean patterns at the early training stage, but as training progresses, Mixup becomes over-fitting to the noise in the synthetic data. Extensive experiments are performed on a variety of benchmark datasets, validating this explanation.

arxiv情報

著者	Zixuan Liu,Ziqiao Wang,Hongyu Guo,Yongyi Mao
発行日	2023-03-02 18:37:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Over-training with Mixup May Hurt Generalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー