Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection

要約

ヘイトスピーチの検出は、自然言語処理における研究の重要な分野であり、オンラインコミュニティの安全性を確保するために不可欠です。
ただし、有害な意図が微妙または間接的な方法で伝えられる暗黙のヘイトスピーチを検出することは、依然として大きな課題です。
明示的なヘイトスピーチとは異なり、暗黙の表現はしばしば文脈、文化的微妙さ、隠されたバイアスに依存しているため、一貫して識別するためにより困難になります。
さらに、このようなスピーチの解釈は、外部の知識と人口統計学的バイアスの影響を受け、異なる言語モデルでさまざまな検出結果をもたらします。
さらに、大規模な言語モデルは、しばしば毒性言語に対する感度の高まりと脆弱なグループへの参照を示し、誤分類につながる可能性があります。
この過敏症は、誤検知（無害な声明を憎しみとして誤って識別する）および偽陰性（真の有害なコンテンツを検出できない）をもたらします。
これらの問題に対処するには、検出精度を改善するだけでなく、モデルバイアスを減らし、堅牢性を高める方法が必要です。
これらの課題に対処するために、モデルの微調整を必要とせずにコンテキスト内学習を利用する新しい方法を提案します。
同様のグループまたは類似性スコアが最も高いグループに焦点を当てたデモを適応的に取得することにより、私たちのアプローチは文脈的理解を高めます。
実験結果は、私たちの方法が現在の最先端の手法よりも優れていることを示しています。
実装の詳細とコードはTBDで入手できます。

要約(オリジナル)

Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety. However, detecting implicit hate speech, where harmful intent is conveyed in subtle or indirect ways, remains a major challenge. Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases, making them more challenging to identify consistently. Additionally, the interpretation of such speech is influenced by external knowledge and demographic biases, resulting in varied detection results across different language models. Furthermore, Large Language Models often show heightened sensitivity to toxic language and references to vulnerable groups, which can lead to misclassifications. This over-sensitivity results in false positives (incorrectly identifying harmless statements as hateful) and false negatives (failing to detect genuinely harmful content). Addressing these issues requires methods that not only improve detection precision but also reduce model biases and enhance robustness. To address these challenges, we propose a novel method, which utilizes in-context learning without requiring model fine-tuning. By adaptively retrieving demonstrations that focus on similar groups or those with the highest similarity scores, our approach enhances contextual comprehension. Experimental results show that our method outperforms current state-of-the-art techniques. Implementation details and code are available at TBD.

arxiv情報

著者	Yumin Kim,Hwanhee Lee
発行日	2025-04-16 13:43:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー