Watch Your Language: Investigating Content Moderation with Large Language Models

要約

大規模言語モデル (LLM) は、さまざまな自然言語タスクを実行できるため、爆発的に人気が高まっています。
テキストベースのコンテンツモデレーションは、最近熱狂的な注目を集めている LLM ユースケースの 1 つですが、コンテンツモデレーション設定で LLM がどのように機能するかを調査した研究はほとんどありません。
この作業では、ルールベースのコミュニティモデレーションと有害なコンテンツの検出という 2 つの一般的なコンテンツモデレーションタスクに関してコモディティ LLM スイートを評価します。
ルールベースのコミュニティモデレーションの場合、95 の Reddit サブコミュニティからのルールを使用して GPT-3.5 を要求することで、95 のサブコミュニティ固有の LLM をインスタンス化します。
GPT-3.5 は多くのコミュニティでルールベースのモデレーションに効果的であり、中央値の精度 64% と中央値の精度 83% を達成していることがわかりました。
毒性検出については、一連の汎用 LLM (GPT-3、GPT-3.5、GPT-4、Gemini Pro、LLAMA 2) を評価し、LLM が現在普及している毒性分類器よりも大幅に優れていることを示します。
ただし、最近のモデルサイズの増加は、毒性検出にわずかな利点しか追加していないため、毒性検出タスクにおける LLM のパフォーマンスが頭打ちになる可能性があることを示唆しています。
最後に、LLM とコンテンツモデレーションの研究における将来の取り組みの道筋を概説します。

要約(オリジナル)

Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs perform in content moderation settings. In this work, we evaluate a suite of commodity LLMs on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we instantiate 95 subcommunity specific LLMs by prompting GPT-3.5 with rules from 95 Reddit subcommunities. We find that GPT-3.5 is effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we evaluate a suite of commodity LLMs (GPT-3, GPT-3.5, GPT-4, Gemini Pro, LLAMA 2) and show that LLMs significantly outperform currently widespread toxicity classifiers. However, recent increases in model size add only marginal benefit to toxicity detection, suggesting a potential performance plateau for LLMs on toxicity detection tasks. We conclude by outlining avenues for future work in studying LLMs and content moderation.

arxiv情報

著者	Deepak Kumar,Yousef AbuHashem,Zakir Durumeric
発行日	2024-01-17 17:41:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Watch Your Language: Investigating Content Moderation with Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー