Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection

要約

コンテンツモデレーションに対する従来のアプローチでは、通常、ルールベースのヒューリスティックアプローチを適用してコンテンツにフラグを立てます。
ルールは簡単にカスタマイズでき、人間が直感的に解釈できる一方で、本質的に脆弱であり、今日オンラインで見られる膨大な量の望ましくないコンテンツを管理するために必要な柔軟性や堅牢性に欠けています。
ディープラーニングの最近の進歩により、非常に効果的なディープニューラルモデルを使用してこれらの課題を克服できることが実証されています。
ただし、パフォーマンスが向上したにもかかわらず、これらのデータ駆動型モデルは透明性と説明性に欠けており、多くの場合、日常ユーザーからの不信感を招き、多くのプラットフォームで採用されません。
この論文では、Rule By Example (RBE) を紹介します。これは、テキストコンテンツのモデレーションタスクの論理ルールから学習するための、新しいサンプルベースの対比学習アプローチです。
RBE はルールに基づいた予測を提供できるため、一般的な深層学習ベースのアプローチと比較して、より説明可能でカスタマイズ可能な予測が可能になります。
私たちのアプローチが、少数のデータ例のみを使用して豊富なルール埋め込み表現を学習できることを示します。
3 つの一般的なヘイトスピーチ分類データセットに関する実験結果は、RBE が最先端の深層学習分類器を上回るパフォーマンスを発揮できること、および教師あり設定と教師なし設定の両方でルールを使用できると同時に、ルールに基づいて説明可能なモデル予測を提供できることを示しています。

要約(オリジナル)

Classic approaches to content moderation typically apply a rule-based heuristic approach to flag content. While rules are easily customizable and intuitive for humans to interpret, they are inherently fragile and lack the flexibility or robustness needed to moderate the vast amount of undesirable content found online today. Recent advances in deep learning have demonstrated the promise of using highly effective deep neural models to overcome these challenges. However, despite the improved performance, these data-driven models lack transparency and explainability, often leading to mistrust from everyday users and a lack of adoption by many platforms. In this paper, we present Rule By Example (RBE): a novel exemplar-based contrastive learning approach for learning from logical rules for the task of textual content moderation. RBE is capable of providing rule-grounded predictions, allowing for more explainable and customizable predictions compared to typical deep learning-based approaches. We demonstrate that our approach is capable of learning rich rule embedding representations using only a few data examples. Experimental results on 3 popular hate speech classification datasets show that RBE is able to outperform state-of-the-art deep learning classifiers as well as the use of rules in both supervised and unsupervised settings while providing explainable model predictions via rule-grounding.

arxiv情報

著者	Christopher Clarke,Matthew Hall,Gaurav Mittal,Ye Yu,Sandra Sajeev,Jason Mars,Mei Chen
発行日	2023-07-24 16:55:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー