An Embedding is Worth a Thousand Noisy Labels

要約

ディープニューラルネットワークのパフォーマンスはデータセットのサイズとラベルの品質に応じて変化するため、低品質のデータアノテーションを効率的に軽減することが、堅牢でコスト効率の高いシステムを構築するために不可欠となります。
ラベルノイズに対処する既存の戦略には、計算の複雑さとアプリケーションの依存性により、深刻な制限があります。
この研究では、基礎モデルから得られた自己教師あり特徴表現に基づいて構築される加重適応最近傍アプローチである WANN を提案します。
加重投票スキームをガイドするために、データラベルが正しい可能性を測定する信頼性スコアを導入します。
WANN は、さまざまなサイズの多様なデータセット上で、さまざまなノイズの種類や重大度の下で、堅牢な損失関数でトレーニングされた線形層などの参照手法よりも優れたパフォーマンスを発揮します。
WANN は、アダプティブ NN (ANN) と固定 k-NN の両方と比較して、不均衡なデータに対して優れた一般化も示します。
さらに、提案された重み付けスキームは、ノイズの多いラベルの下での教師あり次元削減を強化します。
これにより、画像埋め込みが 10 倍および 100 倍小さくなり、分類パフォーマンスが大幅に向上し、待ち時間とストレージ要件が最小限に抑えられます。
効率性と説明可能性を重視した私たちのアプローチは、ディープニューラルネットワークトレーニングに固有の制限を克服するためのシンプルで堅牢なソリューションとして現れます。
コードは https://github.com/francescodisalvo05/wann-noisy-labels で入手できます。

要約(オリジナル)

The performance of deep neural networks scales with dataset size and label quality, rendering the efficient mitigation of low-quality data annotations crucial for building robust and cost-effective systems. Existing strategies to address label noise exhibit severe limitations due to computational complexity and application dependency. In this work, we propose WANN, a Weighted Adaptive Nearest Neighbor approach that builds on self-supervised feature representations obtained from foundation models. To guide the weighted voting scheme, we introduce a reliability score, which measures the likelihood of a data label being correct. WANN outperforms reference methods, including a linear layer trained with robust loss functions, on diverse datasets of varying size and under various noise types and severities. WANN also exhibits superior generalization on imbalanced data compared to both Adaptive-NNs (ANN) and fixed k-NNs. Furthermore, the proposed weighting scheme enhances supervised dimensionality reduction under noisy labels. This yields a significant boost in classification performance with 10x and 100x smaller image embeddings, minimizing latency and storage requirements. Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome the inherent limitations of deep neural network training. The code is available at https://github.com/francescodisalvo05/wann-noisy-labels .

arxiv情報

著者	Francesco Di Salvo,Sebastian Doerrich,Ines Rieger,Christian Ledig
発行日	2024-08-26 15:32:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An Embedding is Worth a Thousand Noisy Labels

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー