Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

要約

ソーシャルメディアプラットフォームの急速な成長は、オンラインコンテンツの毒性に関する大きな懸念を提起しています。
大規模な言語モデル（LLM）が毒性検出に使用される場合、2つの重要な課題が現れます。1）ドメイン固有の毒性知識がないことは、偽陰性につながります。
2）LLMの毒性音声に対する過度の感度は、誤検知をもたらし、言語の自由を制限します。
これらの問題に対処するために、メタトックスと呼ばれる新しい方法を提案し、憎しみと毒性の検出を強化するために、メタ毒性の知識グラフでグラフ検索を活用します。
まず、LLMSを利用して3段階のパイプラインを介して有毒な情報を抽出し、有毒なベンチマークデータセットをコーパンとして機能させることにより、包括的なメタ毒性知識グラフを構築します。
次に、正確で関連する有毒知識を補足するために、検索およびランキングプロセスを介してグラフを照会します。
複数のデータセットにわたる広範な実験と詳細なケーススタディは、メタトックスが誤った陽性率を大幅に低下させ、全体的な毒性検出パフォーマンスを高めることを示しています。
私たちのコードは、https：//github.com/yibozhao624/metatoxで入手できます。

要約(オリジナル)

The rapid growth of social media platforms has raised significant concerns regarding online content toxicity. When Large Language Models (LLMs) are used for toxicity detection, two key challenges emerge: 1) the absence of domain-specific toxic knowledge leads to false negatives; 2) the excessive sensitivity of LLMs to toxic speech results in false positives, limiting freedom of speech. To address these issues, we propose a novel method called MetaTox, leveraging graph search on a meta-toxic knowledge graph to enhance hatred and toxicity detection. First, we construct a comprehensive meta-toxic knowledge graph by utilizing LLMs to extract toxic information through a three-step pipeline, with toxic benchmark datasets serving as corpora. Second, we query the graph via retrieval and ranking processes to supplement accurate, relevant toxic knowledge. Extensive experiments and in-depth case studies across multiple datasets demonstrate that our MetaTox significantly decreases the false positive rate while boosting overall toxicity detection performance. Our code is available at https://github.com/YiboZhao624/MetaTox.

arxiv情報

著者	Yibo Zhao,Jiapeng Zhu,Can Xu,Yao Liu,Xiang Li
発行日	2025-06-02 11:45:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー