RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

要約

多くの商用およびオープンソースモデルは、機械生成されたテキストを非常に高い精度 (99% 以上) で検出すると主張しています。
ただし、これらの検出器のうち共有ベンチマークデータセットで評価されているものはほとんどなく、たとえ評価されているとしても、評価に使用されるデータセットには、サンプリング戦略、敵対的攻撃、オープンソースの生成モデルのバリエーションが欠けており、挑戦性が不十分です。
この研究では、機械生成されたテキスト検出のための最大かつ最も困難なベンチマークデータセットである RAID を紹介します。
RAID には、11 のモデル、8 つのドメイン、11 の敵対的攻撃、および 4 つのデコード戦略にわたる 600 万を超える世代が含まれています。
RAID を使用して、8 つのオープンソース検出器と 4 つのクローズドソース検出器のドメイン外および敵対的堅牢性を評価したところ、現在の検出器は敵対的攻撃、サンプリング戦略のバリエーション、反復ペナルティ、および目に見えない生成モデルによって簡単にだまされてしまうことがわかりました。
検出器の堅牢性についてのさらなる調査を促進するために、データセットとツールをリリースします。

要約(オリジナル)

Many commercial and open-source models claim to detect machine-generated text with very high accuracy (99\% or higher). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging — lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our dataset and tools to encourage further exploration into detector robustness.

arxiv情報

著者	Liam Dugan,Alyssa Hwang,Filip Trhlik,Josh Magnus Ludan,Andrew Zhu,Hainiu Xu,Daphne Ippolito,Chris Callison-Burch
発行日	2024-05-13 17:15:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー