dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans

要約

ヒューマンアノテーターは通常、ニューラルネットワークなどの機械学習モデルをトレーニングするためのアノテーション付きデータを提供します。
しかし、人間による注釈はノイズの影響を受けやすく、汎化パフォーマンスが損なわれます。
ノイズの多いアノテーションに対抗するアプローチに関する方法論的研究には、有意義な経験的評価を行うための対応するデータセットが必要です。
その結果、グラウンドトゥルースラベルが付いた 15 クラスの約 15,750 枚の動物画像で構成される新しいベンチマークデータセット dopanim を導入します。
これらの画像のうち約 10,500 枚に対して、20 人の人間が約 67% の精度で 52,000 を超える注釈を提供しました。
その重要な属性には、(1) ドッペルゲンガー動物を分類するという困難なタスク、(2) アノテーションとして人間が推定した可能性、および (3) アノテーターのメタデータが含まれます。
このデータセットの 7 つのバリアントを使用して、よく知られているマルチアノテーター学習アプローチをベンチマークし、ハードクラスラベルを超えた学習やアクティブラーニングなどのさらなる評価のユースケースを概説します。
データ収集プロセスをエミュレートし、すべての経験的結果を再現するために、当社のデータセットと包括的なコードベースが公開されています。

要約(オリジナル)

Human annotators typically provide annotated data for training machine learning models, such as neural networks. Yet, human annotations are subject to noise, impairing generalization performances. Methodological research on approaches counteracting noisy annotations requires corresponding datasets for a meaningful empirical evaluation. Consequently, we introduce a novel benchmark dataset, dopanim, consisting of about 15,750 animal images of 15 classes with ground truth labels. For approximately 10,500 of these images, 20 humans provided over 52,000 annotations with an accuracy of circa 67%. Its key attributes include (1) the challenging task of classifying doppelganger animals, (2) human-estimated likelihoods as annotations, and (3) annotator metadata. We benchmark well-known multi-annotator learning approaches using seven variants of this dataset and outline further evaluation use cases such as learning beyond hard class labels and active learning. Our dataset and a comprehensive codebase are publicly available to emulate the data collection process and to reproduce all empirical results.

arxiv情報

著者	Marek Herde,Denis Huseljic,Lukas Rauch,Bernhard Sick
発行日	2024-07-30 16:27:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー