DIRECT: Deep Active Learning under Imbalance and Label Noise

要約

クラスの不均衡は、現実世界の機械学習アプリケーションでよく見られる問題であり、まれなクラスや少数派のクラスでパフォーマンスの低下につながることがよくあります。
ラベルのない野生のデータが豊富にある場合、アクティブラーニングはおそらく、問題を根本から解決する最も効果的な手法です。つまり、アノテーション中に、よりバランスのとれた有益なラベル付きサンプルのセットを収集します。
この研究では、最初にクラス分離閾値を特定し、次に分離閾値に近い少数クラスの最も不確実な例に注釈を付ける新しいアルゴリズムを提案します。
1 次元アクティブラーニングへの新たな削減を通じて、当社のアルゴリズム DIRECT は、古典的なアクティブラーニングの文献を活用して、バッチラベリングやラベルノイズに対する許容度などの問題に対処できます。
既存のアルゴリズムと比較して、当社のアルゴリズムは、最先端のアクティブラーニングアルゴリズムと比較してアノテーション予算の 15\% 以上、ランダムサンプリングと比較してアノテーション予算の 90\% 以上を節約します。

要約(オリジナル)

Class imbalance is a prevalent issue in real world machine learning applications, often leading to poor performance in rare and minority classes. With an abundance of wild unlabeled data, active learning is perhaps the most effective technique in solving the problem at its root — collecting a more balanced and informative set of labeled examples during annotation. In this work, we propose a novel algorithm that first identifies the class separation threshold and then annotate the most uncertain examples from the minority classes, close to the separation threshold. Through a novel reduction to one-dimensional active learning, our algorithm DIRECT is able to leverage the classic active learning literature to address issues such as batch labeling and tolerance towards label noise. Compared to existing algorithms, our algorithm saves more than 15\% of the annotation budget compared to state-of-art active learning algorithm and more than 90\% of annotation budget compared to random sampling.

arxiv情報

著者	Shyam Nuggehalli,Jifan Zhang,Lalit Jain,Robert Nowak
発行日	2023-12-14 18:18:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DIRECT: Deep Active Learning under Imbalance and Label Noise

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー