People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

要約

NLP モデルは、性差別的、人種差別的、またはその他の憎悪に満ちたコンテンツの検出など、さまざまな重要なソーシャルコンピューティングタスクで使用されます。
したがって、これらのモデルがスプリアス特徴に対して堅牢であることが不可欠です。
これまでの研究では、反事実拡張データ (CAD) などのトレーニングデータ拡張を使用して、このような偽の特徴に対処することが試みられてきました。
CAD は既存のトレーニングデータポイントに最小限の変更を加え、そのラベルを反転します。
それらをトレーニングすると、偽の特徴に対するモデルの依存性が軽減される可能性があります。
ただし、CAD を手動で生成すると、時間もコストもかかります。
したがって、この作業では、生成 NLP モデルを使用してこのタスクを自動化できるかどうかを評価します。
Polyjuice、ChatGPT、Flan-T5 を使用して CAD を自動生成し、手動で生成した CAD と比較してモデルの堅牢性の向上における有用性を評価します。
複数のドメイン外テストセットでのモデルのパフォーマンスと個々のデータポイントの有効性の両方をテストした結果、手動 CAD が依然として最も効果的である一方で、ChatGPT によって生成された CAD が僅差で 2 番目に来ることがわかりました。
自動化されたメソッドのパフォーマンスが低下する主な理由の 1 つは、自動化されたメソッドで導入される変更が、元のラベルを反転するには不十分であることが多いためです。

要約(オリジナル)

NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.

arxiv情報

著者	Indira Sen,Dennis Assenmacher,Mattia Samory,Isabelle Augenstein,Wil van der Aalst,Claudia Wagne
発行日	2023-11-02 14:31:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー