A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement

要約

コンテンツのモデレーションは通常、人間のモデレーターと機械学習モデルの取り組みを組み合わせたものです。ただし、これらのシステムは、モデレーション中に重大な不一致が発生するデータに依存することが多く、有害性の認識の主観的な性質を反映しています。この不一致をノイズとして無視するのではなく、私たちはそれをノイズとして解釈します。
これは、コンテンツの固有のあいまいさを強調する貴重なシグナルであり、多数派のラベルのみを考慮した場合には見逃される洞察です。この研究では、注釈の不一致を捉えることの重要性を強調する、新しいコンテンツモデレーションフレームワークを紹介します。
私たちのアプローチはマルチタスク学習を使用しており、毒性の分類が主なタスクとして機能し、注釈の不一致が補助的なタスクとして扱われます。さらに、不確実性推定手法、特に共形予測を活用して、コメント注釈の曖昧さとモデル固有の不確実性の両方を考慮しています。
このフレームワークでは、モデレーターがアノテーションの不一致のしきい値を調整することもできるため、曖昧さがいつレビューをトリガーするかを決定する柔軟性が提供されます。私たちは、共同アプローチがモデルのパフォーマンス、キャリブレーション、および不確実性の推定を強化し、パラメーターの効率を向上させることを実証します。
単一タスクの方法と比較してレビュープロセスを改善します。

要約(オリジナル)

Content moderation typically combines the efforts of human moderators and machine learning models.However, these systems often rely on data where significant disagreement occurs during moderation, reflecting the subjective nature of toxicity perception.Rather than dismissing this disagreement as noise, we interpret it as a valuable signal that highlights the inherent ambiguity of the content,an insight missed when only the majority label is considered.In this work, we introduce a novel content moderation framework that emphasizes the importance of capturing annotation disagreement. Our approach uses multitask learning, where toxicity classification serves as the primary task and annotation disagreement is addressed as an auxiliary task.Additionally, we leverage uncertainty estimation techniques, specifically Conformal Prediction, to account for both the ambiguity in comment annotations and the model’s inherent uncertainty in predicting toxicity and disagreement.The framework also allows moderators to adjust thresholds for annotation disagreement, offering flexibility in determining when ambiguity should trigger a review.We demonstrate that our joint approach enhances model performance, calibration, and uncertainty estimation, while offering greater parameter efficiency and improving the review process in comparison to single-task methods.

arxiv情報

著者	Guillermo Villate-Castillo,Javier Del Ser,Borja Sanz
発行日	2024-11-06 18:08:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー