GenAI Content Detection Task 2: AI vs. Human — Academic Essay Authenticity Challenge

要約

このペーパーでは、COLING 2025 と併置された GenAI コンテンツ検出共有タスクの一部として編成された、学術エッセイ真正性チャレンジの第 1 版の包括的な概要を示します。このチャレンジは、学術目的で機械が作成したエッセイと人間が作成したエッセイを検出することに焦点を当てています。
この課題は次のように定義されています。「与えられたエッセイが、機械によって生成されたものか、それとも人間によって作成されたのかを識別してください。」この課題には、英語とアラビア語の 2 つの言語が含まれます。
評価段階では、25 チームが英語用、21 チームがアラビア語用のシステムを提出し、このタスクへの大きな関心を反映しました。
最終的に、7 チームがシステム説明書を提出しました。
提出物の大半は、微調整されたトランスフォーマーベースのモデルを利用しており、1 つのチームは Llama 2 や Llama 3 などの大規模言語モデル (LLM) を採用していました。このペーパーでは、タスクの定式化の概要を示し、データセット構築プロセスの詳細を示し、評価フレームワークについて説明します。
さらに、参加チームが採用したアプローチの概要も紹介します。
提出されたシステムのほぼすべてが n-gram ベースのベースラインを上回り、最高性能のシステムは両方の言語で 0.98 を超える F1 スコアを達成しており、機械生成テキストの検出における大幅な進歩を示しています。

要約(オリジナル)

This paper presents a comprehensive overview of the first edition of the Academic Essay Authenticity Challenge, organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025. This challenge focuses on detecting machine-generated vs. human-authored essays for academic purposes. The task is defined as follows: ‘Given an essay, identify whether it is generated by a machine or authored by a human.” The challenge involves two languages: English and Arabic. During the evaluation phase, 25 teams submitted systems for English and 21 teams for Arabic, reflecting substantial interest in the task. Finally, seven teams submitted system description papers. The majority of submissions utilized fine-tuned transformer-based models, with one team employing Large Language Models (LLMs) such as Llama 2 and Llama 3. This paper outlines the task formulation, details the dataset construction process, and explains the evaluation framework. Additionally, we present a summary of the approaches adopted by participating teams. Nearly all submitted systems outperformed the n-gram-based baseline, with the top-performing systems achieving F1 scores exceeding 0.98 for both languages, indicating significant progress in the detection of machine-generated text.

arxiv情報

著者	Shammur Absar Chowdhury,Hind Almerekhi,Mucahid Kutlu,Kaan Efe Keles,Fatema Ahmad,Tasnim Mohiuddin,George Mikros,Firoj Alam
発行日	2024-12-24 08:33:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GenAI Content Detection Task 2: AI vs. Human — Academic Essay Authenticity Challenge

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー