HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

要約

Twitter や Facebook などのソーシャルメディアプラットフォームの普及により、あらゆる年齢層の人々が自分の考えや経験を共有できるようになり、ユーザーが作成したコンテンツが膨大に蓄積されるようになりました。
しかし、これらのプラットフォームは利点と同時に、合理的な議論を損ない、民主主義の価値観を脅かす可能性のあるヘイトスピーチや不快なコンテンツを管理するという課題にも直面しています。
その結果、特にヒングリッシュ、ドイツ語と英語、バングラ語などのコードが混在する言語を含む複数の言語にわたる文脈分析が必要となる可能性のある会話の複雑さを考慮すると、そのようなコンテンツを検出して軽減するための自動化された方法の必要性が高まっています。
私たちは英語のタスクに参加しました。このタスクでは、英語のツイートを 2 つのカテゴリ、つまりヘイトおよび攻撃的なものと非ヘイト攻撃的なものに分類する必要があります。
この研究では、ツイートをヘイトと攻撃的または非ヘイト攻撃に分類するよう促すプロンプトを介して、GPT-3.5 Turbo のような最先端の大規模言語モデルを実験します。
この研究では、3 つの異なる実行にわたる Macro-F1 スコアを使用して分類モデルのパフォーマンスを評価します。
すべてのクラスにわたって精度と再現率のバランスをとる Macro-F1 スコアは、モデル評価の主要な指標として使用されます。
得られたスコアは、実行 1 で 0.756、実行 2 で 0.751、実行 3 で 0.754 であり、実行間の差異が最小限で高レベルのパフォーマンスを示しています。
結果は、モデルが精度と再現率の点で一貫して良好なパフォーマンスを示しており、実行 1 が最高のパフォーマンスを示していることを示しています。
これらの結果は、さまざまな実行におけるモデルの堅牢性と信頼性を強調しています。

要約(オリジナル)

The widespread use of social media platforms like Twitter and Facebook has enabled people of all ages to share their thoughts and experiences, leading to an immense accumulation of user-generated content. However, alongside the benefits, these platforms also face the challenge of managing hate speech and offensive content, which can undermine rational discourse and threaten democratic values. As a result, there is a growing need for automated methods to detect and mitigate such content, especially given the complexity of conversations that may require contextual analysis across multiple languages, including code-mixed languages like Hinglish, German-English, and Bangla. We participated in the English task where we have to classify English tweets into two categories namely Hate and Offensive and Non Hate-Offensive. In this work, we experiment with state-of-the-art large language models like GPT-3.5 Turbo via prompting to classify tweets into Hate and Offensive or Non Hate-Offensive. In this study, we evaluate the performance of a classification model using Macro-F1 scores across three distinct runs. The Macro-F1 score, which balances precision and recall across all classes, is used as the primary metric for model evaluation. The scores obtained are 0.756 for run 1, 0.751 for run 2, and 0.754 for run 3, indicating a high level of performance with minimal variance among the runs. The results suggest that the model consistently performs well in terms of precision and recall, with run 1 showing the highest performance. These findings highlight the robustness and reliability of the model across different runs.

arxiv情報

著者	Aniket Deroy,Subhankar Maity
発行日	2024-11-14 06:20:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー