The effect of stemming and lemmatization on Portuguese fake news text classification


テキスト データを扱う場合には結果が生じますが、この欺瞞的な情報を検出する動機は、どの情報が真実で信頼でき、どの情報がそうでないかを人々が知る必要があるという事実にあります。
この研究では、見出し語化やステミングなどの前処理方法がフェイク ニュースの分類に与える影響を示します。そのために、さまざまな前処理技術を適用するいくつかの分類子モデルを設計しました。


With the popularization of the internet, smartphones and social media, information is being spread quickly and easily way, which implies bigger traffic of information in the world, but there is a problem that is harming society with the dissemination of fake news. With a bigger flow of information, some people are trying to disseminate deceptive information and fake news. The automatic detection of fake news is a challenging task because to obtain a good result is necessary to deal with linguistics problems, especially when we are dealing with languages that not have been comprehensively studied yet, besides that, some techniques can help to reach a good result when we are dealing with text data, although, the motivation of detecting this deceptive information it is in the fact that the people need to know which information is true and trustful and which one is not. In this work, we present the effect the pre-processing methods such as lemmatization and stemming have on fake news classification, for that we designed some classifier models applying different pre-processing techniques. The results show that the pre-processing step is important to obtain betters results, the stemming and lemmatization techniques are interesting methods and need to be more studied to develop techniques focused on the Portuguese language so we can reach better results.


著者 Lucca de Freitas Santos,Murilo Varges da Silva
発行日 2023-10-17 15:26:40+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.AI, cs.CL パーマリンク