Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models

要約

有害なコンテンツの検出は、オンラインサービスにとってコミュニティ標準に違反する不適切なコンテンツを削除するために重要です。
検出プロセスを自動化するために、これまでの研究では、有害なコンテンツを検出するために言語モデル (LM) をトレーニングするためのさまざまな機械学習 (ML) アプローチが提案されてきました。
ただし、その精度とデータセット間での転送可能性は両方とも制限されています。
最近、大規模言語モデル (LLM) は、その優れたゼロショットおよび少数ショットのコンテキスト内学習能力と、ML タスクでの広範な転送可能性により、有害なコンテンツの検出において有望であることが示されています。
ただし、LLM のプロンプトを効率的に設計することは依然として困難です。
さらに、LLM の実行時コストが高いため、実稼働環境での展開が妨げられる可能性があります。
これらの課題に対処するために、この研究では、有毒成分検出のために LLM をブートストラップおよび蒸留するための新規で効率的なアプローチである BD-LLM を提案します。
具体的には、LLM の検出パフォーマンスをブートストラップし、高品質の理論的根拠を抽出するための、Decision-Tree-of-Thought (DToT) と呼ばれる新しいプロンプト手法を設計します。
DToT は、LLM の応答に自信がない場合に、より詳細なコンテキストを自動的に選択して LLM に再プロンプトを送信できます。
さらに、DToT 経由で抽出された理論的根拠を使用して、生徒の LM を微調整します。
さまざまなデータセットに対する実験結果は、DToT が LLM の精度を最大 4.6% 向上できることを示しています。
さらに、DToT 経由で抽出された理論的根拠を使用して微調整された学生 LM は、従来の LLM よりも 60 分の 1 以上小さいながら、精度が最大 16.9\% 向上し、すべてのデータセットでベースラインを上回ります。
最後に、理論的根拠に基づいて微調整された学生 LM が、より優れたデータセット間転送性を示すことが観察されました。

要約(オリジナル)

Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of machine learning (ML) approaches to train Language Models (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, Large Language Models (LLMs) have shown promise in toxic content detection due to their superior zero-shot and few-shot in-context learning ability as well as broad transferability on ML tasks. However, efficiently designing prompts for LLMs remains challenging. Moreover, the high run-time cost of LLMs may hinder their deployments in production. To address these challenges, in this work, we propose BD-LLM, a novel and efficient approach to Bootstrapping and Distilling LLMs for toxic content detection. Specifically, we design a novel prompting method named Decision-Tree-of-Thought (DToT) to bootstrap LLMs’ detection performance and extract high-quality rationales. DToT can automatically select more fine-grained context to re-prompt LLMs when their responses lack confidence. Additionally, we use the rationales extracted via DToT to fine-tune student LMs. Our experimental results on various datasets demonstrate that DToT can improve the accuracy of LLMs by up to 4.6%. Furthermore, student LMs fine-tuned with rationales extracted via DToT outperform baselines on all datasets with up to 16.9\% accuracy improvement, while being more than 60x smaller than conventional LLMs. Finally, we observe that student LMs fine-tuned with rationales exhibit better cross-dataset transferability.

arxiv情報

著者	Jiang Zhang,Qiong Wu,Yiming Xu,Cheng Cao,Zheng Du,Konstantinos Psounis
発行日	2023-12-13 17:22:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー