LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation

要約

コンテンツモデレーションはグローバルな課題ですが、主要なハイテクプラットフォームは高リソース言語を優先し、低リソース言語にネイティブのモデレーターが不足しています。
効果的な節度はコンテキストキューの理解に依存するため、この不均衡は、非ネイティブモデレーターの限られた文化的理解により、不適切な節度のリスクを高めます。
ユーザー調査を通じて、非ネイティブのモデレーターは、ヘイトスピーチの節度における文化的特有の知識、感情、インターネット文化の解釈に苦労していることを特定します。
彼らを支援するために、3つのステップを持つ人間のllm共同パイプラインであるLLM-C3MODを提示します。
（2）初期LLMベースのモデレート。
（3）LLMコンセンサスを欠いている症例の人間の標的をターゲットにした。
インドネシアおよびドイツの参加者と韓国のヘイトスピーチデータセットで評価されたこのシステムは、78％の精度（GPT-4oの71％のベースラインを上回る）を達成し、人間のワークロードを83.6％削減します。
特に、人間のモデレーターは、LLMSが苦労している微妙なコンテンツで優れています。
私たちの調査結果は、LLMSによって適切にサポートされている場合、非ネイティブモデレーターが異文化間のヘイトスピーチモデレートに効果的に貢献できることを示唆しています。

要約(オリジナル)

Content moderation is a global challenge, yet major tech platforms prioritize high-resource languages, leaving low-resource languages with scarce native moderators. Since effective moderation depends on understanding contextual cues, this imbalance increases the risk of improper moderation due to non-native moderators’ limited cultural understanding. Through a user study, we identify that non-native moderators struggle with interpreting culturally-specific knowledge, sentiment, and internet culture in the hate speech moderation. To assist them, we present LLM-C3MOD, a human-LLM collaborative pipeline with three steps: (1) RAG-enhanced cultural context annotations; (2) initial LLM-based moderation; and (3) targeted human moderation for cases lacking LLM consensus. Evaluated on a Korean hate speech dataset with Indonesian and German participants, our system achieves 78% accuracy (surpassing GPT-4o’s 71% baseline), while reducing human workload by 83.6%. Notably, human moderators excel at nuanced contents where LLMs struggle. Our findings suggest that non-native moderators, when properly supported by LLMs, can effectively contribute to cross-cultural hate speech moderation.

arxiv情報

著者	Junyeong Park,Seogyeong Jeong,Seyoung Song,Yohan Lee,Alice Oh
発行日	2025-03-10 12:20:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー