Hate Speech Detection and Classification in Amharic Text with Deep Learning

要約

ヘイトスピーチはソーシャルメディア上で深刻な問題となっています。
特にエチオピアのような国では、社会に深刻な影響を与える可能性があり、多様な民族や宗教集団間の紛争を引き起こす可能性があります。
リソースが豊富な言語でのヘイトスピーチの検出は進んでいますが、アムハラ語などのリソースが少ない言語では検出が不足しています。
このギャップに対処するために、私たちはアムハラ語のヘイトスピーチデータと、テキストを検出してヘイトスピーチの 4 つのカテゴリ (人種、宗教、性別、非ヘイトスピーチ) に分類できる SBi-LSTM 深層学習モデルを開発しました。
5,000 件のアムハラ語ソーシャルメディア投稿およびコメントデータに 4 つのカテゴリに注釈を付けました。
データには、合計 100 人のアムハラ語ネイティブ話者によってカスタム注釈ツールを使用して注釈が付けられます。
このモデルは、F1 スコア 94.8 のパフォーマンスを達成しています。
将来の改善には、データセットの拡張と最先端のモデルの開発が含まれます。
キーワード: アムハラ語ヘイトスピーチ検出、分類、アムハラ語データセット、深層学習、SBi-LSTM

要約(オリジナル)

Hate speech is a growing problem on social media. It can seriously impact society, especially in countries like Ethiopia, where it can trigger conflicts among diverse ethnic and religious groups. While hate speech detection in resource rich languages are progressing, for low resource languages such as Amharic are lacking. To address this gap, we develop Amharic hate speech data and SBi-LSTM deep learning model that can detect and classify text into four categories of hate speech: racial, religious, gender, and non-hate speech. We have annotated 5k Amharic social media post and comment data into four categories. The data is annotated using a custom annotation tool by a total of 100 native Amharic speakers. The model achieves a 94.8 F1-score performance. Future improvements will include expanding the dataset and develop state-of-the art models. Keywords: Amharic hate speech detection, classification, Amharic dataset, Deep Learning, SBi-LSTM

arxiv情報

著者	Samuel Minale Gashe,Seid Muhie Yimam,Yaregal Assabie
発行日	2024-08-07 15:46:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hate Speech Detection and Classification in Amharic Text with Deep Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー