HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

要約

TwitterやFacebookなどのソーシャルメディアプラットフォームの広範な使用により、あらゆる年齢の人々が自分の考えや経験を共有できるようになり、ユーザーが生成したコンテンツの膨大な蓄積につながりました。
しかし、これらのプラットフォームは、利点とともに、合理的な言説を損ない、民主的価値を脅かす可能性のあるヘイトスピーチと攻撃的なコンテンツを管理するという課題にも直面しています。
その結果、特にHinglish、German-English、Banglaなどのコードミックス言語を含む複数の言語でのコンテキスト分析が必要になる可能性のある会話の複雑さを考えると、そのようなコンテンツを検出および軽減するための自動化された方法の必要性が高まっています。
私たちは英語のタスクに参加しました。そこでは、英語のツイートを2つのカテゴリに分類する必要があります。
この作業では、GPT-3.5ターボなどの最先端の大規模な言語モデルをプロンプトで実験して、ツイートを憎しみや攻撃的または非憎悪に分類するように促します。
この研究では、3つの異なる実行にわたってMacro-F1スコアを使用して分類モデルのパフォーマンスを評価します。
すべてのクラスで精度とリコールのバランスをとるMacro-F1スコアは、モデル評価の主要なメトリックとして使用されます。
得られたスコアは、実行1の場合は0.756、実行2で0.751、ラン3で0.754であり、実行中の分散が最小限の高いパフォーマンスを示しています。
結果は、モデルが精度とリコールの観点から一貫してうまく機能し、実行1が最高のパフォーマンスを示していることを示唆しています。
これらの調査結果は、異なる実行にわたるモデルの堅牢性と信頼性を強調しています。

要約(オリジナル)

The widespread use of social media platforms like Twitter and Facebook has enabled people of all ages to share their thoughts and experiences, leading to an immense accumulation of user-generated content. However, alongside the benefits, these platforms also face the challenge of managing hate speech and offensive content, which can undermine rational discourse and threaten democratic values. As a result, there is a growing need for automated methods to detect and mitigate such content, especially given the complexity of conversations that may require contextual analysis across multiple languages, including code-mixed languages like Hinglish, German-English, and Bangla. We participated in the English task where we have to classify English tweets into two categories namely Hate and Offensive and Non Hate-Offensive. In this work, we experiment with state-of-the-art large language models like GPT-3.5 Turbo via prompting to classify tweets into Hate and Offensive or Non Hate-Offensive. In this study, we evaluate the performance of a classification model using Macro-F1 scores across three distinct runs. The Macro-F1 score, which balances precision and recall across all classes, is used as the primary metric for model evaluation. The scores obtained are 0.756 for run 1, 0.751 for run 2, and 0.754 for run 3, indicating a high level of performance with minimal variance among the runs. The results suggest that the model consistently performs well in terms of precision and recall, with run 1 showing the highest performance. These findings highlight the robustness and reliability of the model across different runs.

arxiv情報

著者	Aniket Deroy,Subhankar Maity
発行日	2025-03-25 12:53:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー