A Classification System Approach in Predicting Chinese Censorship

要約

この論文は、分類器を使用して、Weiboの投稿が中国のインターネットで検閲されるかどうかを予測することに専念しています。
\ citeauthor {fu2021}からのランダム化サンプリングと中国のトークン化戦略を通じて、バイナリ検閲マークを備えたクリーンな中国のフレーズデータセットを作成しました。
データ上のさまざまな確率ベースの情報検索方法を利用して、分類のために4つのロジスティック回帰モデルを導出することができました。
さらに、事前に訓練されたトランスを実験して、同様の分類タスクを実行しました。
Macro-F1とROC-AUCメトリックの両方を評価した後、罰金を科されたBertモデルはパフォーマンスの他の戦略を超えていると結論付けました。

要約(オリジナル)

This paper is dedicated to using a classifier to predict whether a Weibo post would be censored under the Chinese internet. Through randomized sampling from \citeauthor{Fu2021} and Chinese tokenizing strategies, we constructed a cleaned Chinese phrase dataset with binary censorship markings. Utilizing various probability-based information retrieval methods on the data, we were able to derive 4 logistic regression models for classification. Furthermore, we experimented with pre-trained transformers to perform similar classification tasks. After evaluating both the macro-F1 and ROC-AUC metrics, we concluded that the Fined-Tuned BERT model exceeds other strategies in performance.

arxiv情報

著者	Matt Prodani,Tianchu Ze,Yushen Hu
発行日	2025-02-06 17:19:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Classification System Approach in Predicting Chinese Censorship

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー