DOST — Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels

要約

深層学習技術によってもたらされる注釈付きデータに対する膨大な需要には、注釈ノイズの問題が伴います。
この問題は機械学習の文献で広く議論されてきましたが、より複雑な種類のノイズを特徴とする「マルチラベル分類」(MLC) タスクの文脈では比較的解明されていませんでした。
さらに、問題のドメインに特定の論理的制約がある場合、ノイズの多い注釈によってその違反が悪化することが多く、そのようなシステムは専門家にとって受け入れがたいものになります。
この論文では、MLC タスクにおけるドメインルール違反インシデントに対するラベルノイズの影響を研究し、ノイズの影響を軽減するためにドメインルールを学習アルゴリズムに組み込んでいます。
私たちは、ディープラーニングモデルをドメインルールにさらに適合させるだけでなく、主要なメトリクスの学習パフォーマンスを向上させ、アノテーションノイズの影響を最小限に抑える、ドメイン従順自己教師ありトレーニング (DOST) パラダイムを提案します。
この新しいアプローチは、ドメインガイダンスを使用して問題のあるアノテーションを検出し、自己監視型の方法でルール違反の予測を阻止することで、より「データ効率」が高く、ドメインに準拠したものになります。
2 つの大規模なマルチラベル分類データセットに対して実行された実証研究は、私たちの方法が全面的な改善をもたらし、多くの場合ノイズの影響を完全に打ち消すことを示しています。

要約(オリジナル)

The enormous demand for annotated data brought forth by deep learning techniques has been accompanied by the problem of annotation noise. Although this issue has been widely discussed in machine learning literature, it has been relatively unexplored in the context of ‘multi-label classification’ (MLC) tasks which feature more complicated kinds of noise. Additionally, when the domain in question has certain logical constraints, noisy annotations often exacerbate their violations, making such a system unacceptable to an expert. This paper studies the effect of label noise on domain rule violation incidents in the MLC task, and incorporates domain rules into our learning algorithm to mitigate the effect of noise. We propose the Domain Obedient Self-supervised Training (DOST) paradigm which not only makes deep learning models more aligned to domain rules, but also improves learning performance in key metrics and minimizes the effect of annotation noise. This novel approach uses domain guidance to detect offending annotations and deter rule-violating predictions in a self-supervised manner, thus making it more ‘data efficient’ and domain compliant. Empirical studies, performed over two large scale multi-label classification datasets, demonstrate that our method results in improvement across the board, and often entirely counteracts the effect of noise.

arxiv情報

著者	Soumadeep Saha,Utpal Garain,Arijit Ukil,Arpan Pal,Sundeep Khandelwal
発行日	2023-08-09 17:53:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DOST — Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー