Exploring the Influence of Label Aggregation on Minority Voices: Implications for Dataset Bias and Model Training


私たちは少数派のアノテーションの品質と価値を調査し、ゴールド ラベルのクラス分布に対するそれらの影響と、それが結果として得られるデータセットでトレーニングされたモデルの動作にどのような影響を与えるかを調べます。


Resolving disagreement in manual annotation typically consists of removing unreliable annotators and using a label aggregation strategy such as majority vote or expert opinion to resolve disagreement. These may have the side-effect of silencing or under-representing minority but equally valid opinions. In this paper, we study the impact of standard label aggregation strategies on minority opinion representation in sexism detection. We investigate the quality and value of minority annotations, and then examine their effect on the class distributions in gold labels, as well as how this affects the behaviour of models trained on the resulting datasets. Finally, we discuss the potential biases introduced by each method and how they can be amplified by the models.


著者 Mugdha Pandya,Nafise Sadat Moosavi,Diana Maynard
発行日 2024-12-05 10:00:49+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL パーマリンク