Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning

要約

実際のデータセットでは、長期にわたる分布とノイズの多いラベルの課題がしばしば共存し、モデルのトレーニングとパフォーマンスに障害をもたらします。
長い尾のあるノイズのあるラベル学習（LTNLL）に関する既存の研究は、通常、ノイズの多いラベルの生成は、実際の観点からは当てはまらない長期尾の分布とは無関係であると想定しています。
実際の存在状態では、テールクラスのサンプルが頭として誤ってラベル付けされる可能性が高く、元の不均衡の程度を悪化させる可能性が高いことがわかります。
この現象を「尾から頭（T2H）」と呼びます。
T2Hノイズは、ヘッドクラスを汚染し、モデルにテールサンプルをヘッドとして学習するように強制することにより、モデルのパフォーマンスを大幅に分解します。
この課題に対処するために、NOSIYラベルの動的な誤解を招くプロセスを調査し、長期にわたるラベルノイズのようなデータ（鈍い）の解きと学習と呼ばれる新しい方法を提案します。
最初に、内部の栄養障害（ifd）を使用して、内部的に特徴を解くことを採用します。
これに基づいて、内部のfeature部分学習（IFPU）を適用して、間違ったクラスに相関する誤った特徴領域を弱めて解き放ちます。
この方法は、モデルがノイズの多いラベルに惑わされるのを防ぎ、モデルのノイズに対する堅牢性を高めます。
制御された実験環境を提供するために、T2Hノイズをシミュレートするための新しいノイズ追加アルゴリズムをさらに提案します。
シミュレートされたデータセットと実際のデータセットの両方での広範な実験は、提案された方法の有効性を示しています。

要約(オリジナル)

In real-world datasets, the challenges of long-tailed distributions and noisy labels often coexist, posing obstacles to the model training and performance. Existing studies on long-tailed noisy label learning (LTNLL) typically assume that the generation of noisy labels is independent of the long-tailed distribution, which may not be true from a practical perspective. In real-world situaiton, we observe that the tail class samples are more likely to be mislabeled as head, exacerbating the original degree of imbalance. We call this phenomenon as “tail-to-head (T2H)” noise. T2H noise severely degrades model performance by polluting the head classes and forcing the model to learn the tail samples as head. To address this challenge, we investigate the dynamic misleading process of the nosiy labels and propose a novel method called Disentangling and Unlearning for Long-tailed and Label-noisy data (DULL). It first employs the Inner-Feature Disentangling (IFD) to disentangle feature internally. Based on this, the Inner-Feature Partial Unlearning (IFPU) is then applied to weaken and unlearn incorrect feature regions correlated to wrong classes. This method prevents the model from being misled by noisy labels, enhancing the model’s robustness against noise. To provide a controlled experimental environment, we further propose a new noise addition algorithm to simulate T2H noise. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness of our proposed method.

arxiv情報

著者	Chen Shu,Mengke Li,Yiqun Zhang,Yang Lu,Bo Han,Yiu-ming Cheung,Hanzi Wang
発行日	2025-03-14 13:58:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー