A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement

要約

コンテンツのモデレーションは通常、人間のモデレーターと機械学習モデルの取り組みを組み合わせて行われます。
ただし、これらのシステムは、毒性認識の主観的な性質を反映して、モデレーション中に重大な不一致が発生するデータに依存することがよくあります。
私たちは、この不一致をノイズとして無視するのではなく、コンテンツの本質的な曖昧さを強調する貴重な信号として解釈します。これは、多数派のラベルだけを考慮した場合には見落とされる洞察です。
この研究では、注釈の不一致を捉えることの重要性を強調する、新しいコンテンツモデレーションフレームワークを導入します。
私たちのアプローチはマルチタスク学習を使用しており、毒性の分類が主要なタスクとして機能し、注釈の不一致は補助的なタスクとして扱われます。
さらに、不確実性推定手法、特に正角予測を活用して、コメント注釈のあいまいさと、有害性や不一致の予測におけるモデル固有の不確実性の両方を考慮します。また、このフレームワークにより、モデレーターが注釈の不一致のしきい値を調整できるため、いつ曖昧になるかを判断する際の柔軟性が提供されます。
レビューをトリガーする必要があります。
私たちの共同アプローチは、シングルタスク手法と比較して、より優れたパラメーター効率を提供し、レビュープロセスを改善しながら、モデルのパフォーマンス、キャリブレーション、および不確実性の推定を強化することを実証します。

要約(オリジナル)

Content moderation typically combines the efforts of human moderators and machine learning models. However, these systems often rely on data where significant disagreement occurs during moderation, reflecting the subjective nature of toxicity perception. Rather than dismissing this disagreement as noise, we interpret it as a valuable signal that highlights the inherent ambiguity of the content,an insight missed when only the majority label is considered. In this work, we introduce a novel content moderation framework that emphasizes the importance of capturing annotation disagreement. Our approach uses multitask learning, where toxicity classification serves as the primary task and annotation disagreement is addressed as an auxiliary task. Additionally, we leverage uncertainty estimation techniques, specifically Conformal Prediction, to account for both the ambiguity in comment annotations and the model’s inherent uncertainty in predicting toxicity and disagreement.The framework also allows moderators to adjust thresholds for annotation disagreement, offering flexibility in determining when ambiguity should trigger a review. We demonstrate that our joint approach enhances model performance, calibration, and uncertainty estimation, while offering greater parameter efficiency and improving the review process in comparison to single-task methods.

arxiv情報

著者	Guillermo Villate-Castillo,Javier Del Ser,Borja Sanz
発行日	2024-11-07 07:12:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー