JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models’ Detection of Human Self-Destructive Behavior Content in Jirai Community

要約

このペーパーでは、中国と日本のソーシャルメディアコミュニティで自己破壊的なコンテンツを検出する際の大規模な言語モデルの有効性を評価するための最初のバイリンガルベンチマークであるJiraibenchを紹介します。
麻薬過剰摂取、摂食障害、自傷行為など、複数の形態の自己破壊的行動を含む、国境を越えた「ジライ」（土地）のオンラインサブカルチャーに焦点を当て、言語的および文化的側面の両方を組み込んだ包括的な評価フレームワークを紹介します。
私たちのデータセットは、3つの行動カテゴリに沿って多次元注釈を備えた10,419の中国の投稿と5,000の日本の投稿で構成されており、大規模なアノテーター間契約を達成しています。
4つの最先端のモデルにわたる実験的評価は、教育言語に基づいた重要なパフォーマンスの変動を明らかにしており、日本のプロンプトは、中国のコンテンツを処理する際に中国のプロンプトを予期せずに上回ります。
この出現した異文化間移転は、文化的近接性が検出タスクの言語的類似性を上回ることがあることを示唆しています。
微調整されたモデルを使用した言語間転送実験は、明示的なターゲット言語トレーニングなしで、これらの言語システム間の知識移転の可能性をさらに示しています。
これらの調査結果は、多言語コンテンツの節度に対する文化に基づいたアプローチの必要性を強調し、脆弱なオンラインコミュニティ向けのより効果的な検出システムを開発する際の文化的文脈の重要性に関する経験的証拠を提供します。

要約(オリジナル)

This paper introduces JiraiBench, the first bilingual benchmark for evaluating large language models’ effectiveness in detecting self-destructive content across Chinese and Japanese social media communities. Focusing on the transnational ‘Jirai’ (landmine) online subculture that encompasses multiple forms of self-destructive behaviors including drug overdose, eating disorders, and self-harm, we present a comprehensive evaluation framework incorporating both linguistic and cultural dimensions. Our dataset comprises 10,419 Chinese posts and 5,000 Japanese posts with multidimensional annotation along three behavioral categories, achieving substantial inter-annotator agreement. Experimental evaluations across four state-of-the-art models reveal significant performance variations based on instructional language, with Japanese prompts unexpectedly outperforming Chinese prompts when processing Chinese content. This emergent cross-cultural transfer suggests that cultural proximity can sometimes outweigh linguistic similarity in detection tasks. Cross-lingual transfer experiments with fine-tuned models further demonstrate the potential for knowledge transfer between these language systems without explicit target language training. These findings highlight the need for culturally-informed approaches to multilingual content moderation and provide empirical evidence for the importance of cultural context in developing more effective detection systems for vulnerable online communities.

arxiv情報

著者	Yunze Xiao,Tingyu He,Lionel Z. Wang,Yiming Ma,Xingyu Song,Xiaohang Xu,Irene Li,Ka Chung Ng
発行日	2025-03-27 16:48:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models’ Detection of Human Self-Destructive Behavior Content in Jirai Community

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー