MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels


ディープラーニングは大きな成功を収めていますが、多くの場合、正確なラベルが付いた大量のトレーニング データに依存しており、収集には費用と時間がかかります。
この研究では、各データ インスタンスの全体的な学習ダイナミクスを考慮してクリーン データを識別する、ワイブル混合モデルに基づく反復選択アプローチを提案します。
私たちの方法を検証するために、ノイズを含む合成データセットと現実世界の Web データに対して広範な実験を実行しました。その戦略は、既存のノイズを含むラベル学習方法よりも優れています。


Despite deep learning has achieved great success, it often relies on a large amount of training data with accurate labels, which are expensive and time-consuming to collect. A prominent direction to reduce the cost is to learn with noisy labels, which are ubiquitous in the real-world applications. A critical challenge for such a learning task is to reduce the effect of network memorization on the falsely-labeled data. In this work, we propose an iterative selection approach based on the Weibull mixture model, which identifies clean data by considering the overall learning dynamics of each data instance. In contrast to the previous small-loss heuristics, we leverage the observation that deep network is easy to memorize and hard to forget clean data. In particular, we measure the difficulty of memorization and forgetting for each instance via the transition times between being misclassified and being memorized in training, and integrate them into a novel metric for selection. Based on the proposed metric, we retain a subset of identified clean data and repeat the selection procedure to iteratively refine the clean subset, which is finally used for model training. To validate our method, we perform extensive experiments on synthetic noisy datasets and real-world web data, and our strategy outperforms existing noisy-label learning methods.


著者 Chuanyang Hu,Shipeng Yan,Zhitong Gao,Xuming He
発行日 2024-01-30 12:55:08+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.CV, cs.LG パーマリンク