The Re-Label Method For Data-Centric Machine Learning

要約

ディープラーニングのアプリケーションでは、手作業でラベル付けされたデータにはノイズが多く含まれる。この問題を解決し、devデータセットで90以上のスコアを達成するために、我々は、ノイズの多いデータを見つけ、人間のラベリングにおける参照としてモデル予測を与え、ノイズの多いデータを人間によって再ラベリングする簡単な方法を提示する。本論文では、分類、シーケンスタギング、オブジェクト検出、シーケンス生成、クリック率予測などの幅広いディープラーニングタスクを対象に、我々のアイデアを説明する。開発データセットの評価結果と人間による評価結果から、我々のアイデアを検証する。

要約(オリジナル)

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The dev dataset evaluation results and human evaluation results verify our idea.

arxiv情報

著者	Tong Guo
発行日	2024-11-01 02:49:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

The Re-Label Method For Data-Centric Machine Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー