Beemo: Benchmark of Expert-edited Machine-generated Outputs

要約

大規模言語モデル (LLM) の急速な普及により、機械生成テキスト (MGT) の量が増加し、さまざまな分野でテキストの作成者が曖昧になっています。
ただし、既存の MGT ベンチマークのほとんどには、単一著者のテキスト (人間が書いたテキストと機械が生成したテキスト) が含まれています。
この従来の設計では、ユーザーが自然な流れ、一貫性、事実の正確さのために LLM 応答を調整する、より実用的な複数作成者のシナリオを捉えることができません。
私たちの論文では、専門家が編集した機械生成出力のベンチマーク (Beemo) を紹介します。これには、人間が書いた 6.5,000 のテキストが含まれており、10 個の命令で微調整された LLM によって生成され、クリエイティブな執筆から要約に至るまで、さまざまなユースケースに合わせて専門家によって編集されています。
Beemo はさらに、13.1k の機械生成および LLM 編集されたテキストで構成されており、さまざまな編集タイプにわたる多様な MGT 検出評価が可能です。
私たちは、Beemo の作成プロトコルを文書化し、さまざまな実験設定での MGT 検出器の 33 構成のベンチマークの結果を示します。
専門家ベースの編集は MGT の検出を回避しますが、LLM で編集されたテキストは人間が書いたものとして認識される可能性は低いことがわかりました。
Beemoとすべての素材は公開されています。

要約(オリジナル)

The rapid proliferation of large language models (LLMs) has increased the volume of machine-generated texts (MGTs) and blurred text authorship in various domains. However, most existing MGT benchmarks include single-author texts (human-written and machine-generated). This conventional design fails to capture more practical multi-author scenarios, where the user refines the LLM response for natural flow, coherence, and factual correctness. Our paper introduces the Benchmark of Expert-edited Machine-generated Outputs (Beemo), which includes 6.5k texts written by humans, generated by ten instruction-finetuned LLMs, and edited by experts for various use cases, ranging from creative writing to summarization. Beemo additionally comprises 13.1k machine-generated and LLM-edited texts, allowing for diverse MGT detection evaluation across various edit types. We document Beemo’s creation protocol and present the results of benchmarking 33 configurations of MGT detectors in different experimental setups. We find that expert-based editing evades MGT detection, while LLM-edited texts are unlikely to be recognized as human-written. Beemo and all materials are publicly available.

arxiv情報

著者	Ekaterina Artemova,Jason Lucas,Saranya Venkatraman,Jooyoung Lee,Sergei Tilga,Adaku Uchendu,Vladislav Mikhailov
発行日	2024-11-06 16:31:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Beemo: Benchmark of Expert-edited Machine-generated Outputs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー