A Federated Approach to Few-Shot Hate Speech Detection for Marginalized Communities

要約

オンラインでのヘイトスピーチは、疎外されたコミュニティにとって依然として十分に研究されていない問題であり、特にインターネットの普及が進む発展途上社会を含むグローバル・サウスにおいて、その関連性が高まっている。
この論文では、リソースが少ない言語が主流となっている社会で疎外されたコミュニティに、母国語で不快なコンテンツをフィルタリングすることでインターネット上のヘイトスピーチから身を守るプライバシー保護ツールを提供することを目指しています。
このペーパーにおける私たちの貢献は 2 つあります。1) 私たちは、8 つの低リソース言語で 7 つの異なるターゲットグループを構成する高品質で文化固有のヘイトスピーチ検出データセットのコレクションである REACT (REsponsive hat speech datasets Across ConTexts) をリリースします。
経験豊富なデータコレクター。
2) さまざまなターゲットグループや言語に取り組む際に堅牢性を示す中央モデルを継続的に改善するために、プライバシー保護と協調学習アプローチであるフェデレーテッドラーニング (FL) を利用した、少数発のヘイトスピーチ検出のソリューションを提案します。
トレーニングをユーザーのデバイスのローカルに維持することで、フェデレーテッドラーニングの効率性の恩恵を受けながら、ユーザーのデータのプライバシーを確保します。
さらに、ターゲット固有のトレーニングデータに合わせてクライアントモデルをパーソナライズし、そのパフォーマンスを評価します。
私たちの結果は、さまざまなターゲットグループにわたる FL の有効性を示していますが、少数ショット学習におけるパーソナライゼーションの利点は明らかではありません。

要約(オリジナル)

Hate speech online remains an understudied issue for marginalized communities, and has seen rising relevance, especially in the Global South, which includes developing societies with increasing internet penetration. In this paper, we aim to provide marginalized communities living in societies where the dominant language is low-resource with a privacy-preserving tool to protect themselves from hate speech on the internet by filtering offensive content in their native languages. Our contribution in this paper is twofold: 1) we release REACT (REsponsive hate speech datasets Across ConTexts), a collection of high-quality, culture-specific hate speech detection datasets comprising seven distinct target groups in eight low-resource languages, curated by experienced data collectors; 2) we propose a solution to few-shot hate speech detection utilizing federated learning (FL), a privacy-preserving and collaborative learning approach, to continuously improve a central model that exhibits robustness when tackling different target groups and languages. By keeping the training local to the users’ devices, we ensure the privacy of the users’ data while benefitting from the efficiency of federated learning. Furthermore, we personalize client models to target-specific training data and evaluate their performance. Our results indicate the effectiveness of FL across different target groups, whereas the benefits of personalization on few-shot learning are not clear.

arxiv情報

著者	Haotian Ye,Axel Wisiorek,Antonis Maronikolakis,Özge Alaçam,Hinrich Schütze
発行日	2024-12-06 11:00:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Federated Approach to Few-Shot Hate Speech Detection for Marginalized Communities

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー