ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

要約

大規模な言語モデルがチャットボットで達成した目覚ましい進歩にも関わらず、毒性のないユーザーと AI の対話環境を維持することは、今日ますます重要になっています。
しかし、毒性検出におけるこれまでの取り組みは、主にソーシャルメディアコンテンツから得られたベンチマークに基づいており、現実世界のユーザーと AI のインタラクションに固有の固有の課題については十分に検討されていませんでした。
この作業では、オープンソースチャットボットからの実際のユーザークエリに基づく新しいベンチマークである ToxicChat を紹介します。
このベンチマークには、現在の毒性検出モデルでは識別するのが難しい可能性がある豊富で微妙な現象が含まれており、ソーシャルメディアコンテンツと比較してドメインの大きな違いが明らかになります。
既存の毒性データセットでトレーニングされたモデルの体系的な評価により、ToxicChat のこの独自の領域に適用した場合の欠点が明らかになりました。
私たちの研究は、現実世界のユーザーと AI の会話における有害性検出の見落とされがちな課題を明らかにします。
将来的には、ToxicChat は、ユーザーと AI の対話のための安全で健全な環境の構築に向けたさらなる進歩を推進する貴重なリソースとなる可能性があります。

要約(オリジナル)

Despite remarkable advances that large language models have achieved in chatbots, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media content, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference compared to social media content. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.

arxiv情報

著者	Zi Lin,Zihan Wang,Yongqi Tong,Yangkun Wang,Yuxin Guo,Yujia Wang,Jingbo Shang
発行日	2023-10-26 13:35:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー