Sentiment Analysis Dataset in Moroccan Dialect: Bridging the Gap Between Arabic and Latin Scripted dialect


しかし、過小評価され続けている側面の 1 つは、独特の言語景観と複数の文字の共存を誇るモロッコ方言の感情分析です。
これらの取り組みは貴重な洞察を提供しましたが、アラビア文字とラテン文字が混在するモロッコの Web コンテンツの複雑さを完全には捉えていない可能性があります。
多様なテキスト データを組み立てることにより、モロッコ方言で手動でラベル付けされた 20,000 の範囲のテキストと、公開されているモロッコ方言のストップワードのリストを含むデータセットを構築することができました。
私たちはモデルで 92% の精度を達成することができ、その信頼性をさらに証明するために、公開されているモロッコ方言の小規模なデータセットでモデルをテストしたところ、良好な結果が得られました。


Sentiment analysis, the automated process of determining emotions or opinions expressed in text, has seen extensive exploration in the field of natural language processing. However, one aspect that has remained underrepresented is the sentiment analysis of the Moroccan dialect, which boasts a unique linguistic landscape and the coexistence of multiple scripts. Previous works in sentiment analysis primarily targeted dialects employing Arabic script. While these efforts provided valuable insights, they may not fully capture the complexity of Moroccan web content, which features a blend of Arabic and Latin script. As a result, our study emphasizes the importance of extending sentiment analysis to encompass the entire spectrum of Moroccan linguistic diversity. Central to our research is the creation of the largest public dataset for Moroccan dialect sentiment analysis that incorporates not only Moroccan dialect written in Arabic script but also in Latin letters. By assembling a diverse range of textual data, we were able to construct a dataset with a range of 20 000 manually labeled text in Moroccan dialect and also publicly available lists of stop words in Moroccan dialect. To dive into sentiment analysis, we conducted a comparative study on multiple Machine learning models to assess their compatibility with our dataset. Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing step. We were able to achieve 92% accuracy in our model and to further prove its liability we tested our model on smaller publicly available datasets of Moroccan dialect and the results were favorable.


著者 Mouad Jbel,Imad Hafidi,Abdulmutallib Metrane
発行日 2023-11-06 18:38:55+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.CL パーマリンク