ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

要約

大規模な言語モデル（LLM）は、自動化された有害なコンテンツ検出タスクにますます適用されており、モデレーターがポリシー違反を特定し、コンテンツレビューの全体的な効率と精度を改善するのを支援しています。
ただし、有害なコンテンツ検出のための既存のリソースは、主に英語に焦点を当てており、中国のデータセットは不足しており、範囲が限られていることがよくあります。
中国のコンテンツハーム検出のための包括的な、専門的に注釈付きのベンチマークを提示します。これは、6つの代表的なカテゴリをカバーし、完全に実際のデータから構築されています。
私たちの注釈プロセスは、中国の有害なコンテンツ検出におけるLLMSを支援する明示的な専門知識を提供する知識ルールベースをさらに生成します。
さらに、人間が注目した知識ルールと大規模な言語モデルからの暗黙の知識の両方を統合する知識が熟成したベースラインを提案し、小さなモデルが最先端のLLMに匹敵するパフォーマンスを実現できるようにします。
コードとデータは、https：//github.com/zjunlp/chineseharm-benchで入手できます。

要約(オリジナル)

Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench.

arxiv情報

著者	Kangwei Liu,Siyuan Cheng,Bozhong Tian,Xiaozhuan Liang,Yuyang Yin,Meng Han,Ningyu Zhang,Bryan Hooi,Xi Chen,Shumin Deng
発行日	2025-06-12 17:57:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー