DIRECT: Deep Active Learning under Imbalance and Label Noise

要約

クラスの不均衡は、現実世界の機械学習アプリケーションで広く見られる問題であり、まれなクラスや少数派のクラスでパフォーマンスの低下につながることがよくあります。
ラベルのない野生のデータが豊富にある場合、アクティブラーニングはおそらく、問題を根本から解決する最も効果的な手法です。つまり、アノテーションの際に、よりバランスの取れた有益なラベル付きサンプルのセットを収集します。
ラベルノイズもデータアノテーションジョブでよく見られる問題であり、アクティブラーニング手法では特に困難です。
この研究では、クラスの不均衡とラベルノイズの両方の下でアクティブラーニングの最初の研究を実行します。
我々は、クラス分離閾値を確実に特定し、それに最も近い最も不確実な例に注釈を付ける新しいアルゴリズムを提案します。
1 次元アクティブラーニングへの新たな削減を通じて、当社のアルゴリズム DIRECT は、古典的なアクティブラーニングの文献を活用して、バッチラベリングやラベルノイズに対する許容度などの問題に対処できます。
ラベルノイズの有無にかかわらず、不均衡なデータセットに関する広範な実験を紹介します。
私たちの結果は、DIRECT が最先端の能動学習アルゴリズムと比較してアノテーション予算の 60% 以上を節約でき、ランダムサンプリングと比較してアノテーション予算の 80% 以上を節約できることを示しています。

要約(オリジナル)

Class imbalance is a prevalent issue in real world machine learning applications, often leading to poor performance in rare and minority classes. With an abundance of wild unlabeled data, active learning is perhaps the most effective technique in solving the problem at its root — collecting a more balanced and informative set of labeled examples during annotation. Label noise is another common issue in data annotation jobs, which is especially challenging for active learning methods. In this work, we conduct the first study of active learning under both class imbalance and label noise. We propose a novel algorithm that robustly identifies the class separation threshold and annotates the most uncertain examples that are closest from it. Through a novel reduction to one-dimensional active learning, our algorithm DIRECT is able to leverage the classic active learning literature to address issues such as batch labeling and tolerance towards label noise. We present extensive experiments on imbalanced datasets with and without label noise. Our results demonstrate that DIRECT can save more than 60% of the annotation budget compared to state-of-art active learning algorithms and more than 80% of annotation budget compared to random sampling.

arxiv情報

著者	Shyam Nuggehalli,Jifan Zhang,Lalit Jain,Robert Nowak
発行日	2024-05-20 15:06:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DIRECT: Deep Active Learning under Imbalance and Label Noise

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー